Diverging Divergences: Examining Variants of Jensen Shannon Divergence for Corpus Comparison Tasks

DC FieldValueLanguage
dc.contributor.authorLu, Jinghui-
dc.contributor.authorHenchion, Maeve-
dc.contributor.authorMacNamee, Brian-
dc.date.accessioned2021-10-26T11:13:30Z-
dc.date.available2021-10-26T11:13:30Z-
dc.date.issued2020-05-13-
dc.identifier.urihttp://hdl.handle.net/10197/12574-
dc.descriptionThe 12th Language Resources and Evaluation Conference (LREC 2020), Marseille, France, 11-16 May 2020 (cancelled due to coronavirus outbreak)en_US
dc.description.abstractJensen-Shannon divergence (JSD) is a distribution similarity measurement widely used in natural language processing. In corpus comparison tasks, where keywords are extracted to reveal the divergence between different corpora (for example, social media posts from proponents of different views on a political issue), two variants of JSD have emerged in the literature. One of these uses a weighting based on the relative sizes of the corpora being compared. In this paper we argue that this weighting is unnecessary and, in fact, can lead to misleading results. We recommend that this weighted version is not used. We base this recommendation on an analysis of the JSD variants and experiments showing how they impact corpus comparison results as the relative sizes of the corpora being compared change.en_US
dc.description.sponsorshipScience Foundation Irelanden_US
dc.description.sponsorshipTeagascen_US
dc.language.isoenen_US
dc.subjectCorpus comparisonen_US
dc.subjectJensen-Shannon divergenceen_US
dc.titleDiverging Divergences: Examining Variants of Jensen Shannon Divergence for Corpus Comparison Tasksen_US
dc.typeConference Publicationen_US
dc.internal.authorcontactotherbrian.macnamee@ucd.ieen_US
dc.internal.webversionshttps://lrec2020.lrec-conf.org/en/-
dc.internal.webversionshttps://aclanthology.org/2020.lrec-1.832/-
dc.statusPeer revieweden_US
dc.neeo.contributorLu|Jinghui|aut|-
dc.neeo.contributorHenchion|Maeve|aut|-
dc.neeo.contributorMacNamee|Brian|aut|-
dc.date.updated2021-01-22T23:09:20Z-
dc.identifier.grantid2016053-
dc.identifier.grantid2016053-
dc.identifier.grantid12/RC/2289 P2-
dc.rights.licensehttps://creativecommons.org/licenses/by/3.0/ie/en_US
item.fulltextWith Fulltext-
item.grantfulltextopen-
Appears in Collections:Computer Science Research Collection
Insight Research Collection
Show simple item record

Page view(s)

174
Last Week
4
Last month
21
checked on Jan 22, 2022

Download(s)

9
checked on Jan 22, 2022

Google ScholarTM

Check


If you are a publisher or author and have copyright concerns for any item, please email research.repository@ucd.ie and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.