Options
Diverging Divergences: Examining Variants of Jensen Shannon Divergence for Corpus Comparison Tasks
File(s)
File | Description | Size | Format | |
---|---|---|---|---|
LREC2020__Diverging_Divergences__Examining_Variants_of_Jensen_Shannon_Divergence_for_Corpus_Comparison_Tasks.pdf | 693.74 KB |
Author(s)
Date Issued
13 May 2020
Date Available
26T11:13:30Z October 2021
Abstract
Jensen-Shannon divergence (JSD) is a distribution similarity measurement widely used in natural language processing. In corpus comparison tasks, where keywords are extracted to reveal the divergence between different corpora (for example, social media posts from proponents of different views on a political issue), two variants of JSD have emerged in the literature. One of these uses a weighting based on the relative sizes of the corpora being compared. In this paper we argue that this weighting is unnecessary and, in fact, can lead to misleading results. We recommend that this weighted version is not used. We base this recommendation on an analysis of the JSD variants and experiments showing how they impact corpus comparison results as the relative sizes of the corpora being compared change.
Sponsorship
Science Foundation Ireland
Teagasc
Type of Material
Conference Publication
Language
English
Status of Item
Peer reviewed
Conference Details
The 12th Language Resources and Evaluation Conference (LREC 2020), Marseille, France, 11-16 May 2020 (cancelled due to coronavirus outbreak)
This item is made available under a Creative Commons License
Owning collection
Views
332
Last Week
3
3
Last Month
4
4
Acquisition Date
Jun 6, 2023
Jun 6, 2023
Downloads
67
Last Month
7
7
Acquisition Date
Jun 6, 2023
Jun 6, 2023