How Many Topics? Stability Analysis for Topic Models

DC FieldValueLanguage
dc.contributor.authorGreene, Derek
dc.contributor.authorO'Callaghan, Derek
dc.contributor.authorCunningham, Pádraig
dc.date.accessioned2015-06-21T12:24:57Z
dc.date.available2015-06-21T12:24:57Z
dc.date.copyright2014 Springer
dc.date.issued2014-09-19
dc.identifier.citationMachine Learning and Knowledge Discovery in Databases. Proceedings Part I.
dc.identifier.urihttp://hdl.handle.net/10197/6617
dc.descriptionEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML '14), 15-19 September, Nancy, Franceen
dc.description.abstractTopic modeling refers to the task of discovering the underlyingthematic structure in a text corpus, where the output is commonlypresented as a report of the top terms appearing in each topic. Despitethe diversity of topic modeling algorithms that have been proposed, acommon challenge in successfully applying these techniques is the selectionof an appropriate number of topics for a given corpus. Choosingtoo few topics will produce results that are overly broad, while choosingtoo many will result in theover-clustering of a corpus into many small,highly-similar topics. In this paper, we propose a term-centric stabilityanalysis strategy to address this issue, the idea being that a model withan appropriate number of topics will be more robust to perturbations inthe data. Using a topic modeling approach based on matrix factorization,evaluations performed on a range of corpora show that this strategy cansuccessfully guide the model selection process.en
dc.language.isoenen
dc.rightsThe final publication is available at Springer via http://dx.doi.org/10.1007/978-3-662-44848-9_32
dc.subjectStatisticsen
dc.subjectMachine learningen
dc.subjectLatent Dirichlet Allocation (LDA)en
dc.subjectNon-negative Matrix Factorization (NMF)en
dc.subjectTopic modelingen
dc.subjectCorporaen
dc.titleHow Many Topics? Stability Analysis for Topic Modelsen
dc.typeConference Publicationen
dc.internal.webversionshttp://www.ecmlpkdd2014.org/-
dc.statusPeer revieweden
dc.identifier.startpage498
dc.identifier.endpage513
dc.identifier.doi10.1007/978-3-662-44848-9_32-
dc.neeo.contributorGreene|Derek|aut|-
dc.neeo.contributorO'Callaghan|Derek|aut|-
dc.neeo.contributorCunningham|Pádraig|aut|-
dc.description.othersponsorshipScience Foundation Irelanden
dc.date.updated2015-05-27T15:00:52Z
dc.rights.licensehttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/en
item.fulltextWith Fulltext-
item.grantfulltextopen-
Appears in Collections:Computer Science Research Collection
Insight Research Collection
Files in This Item:
 File SizeFormat
Downloadinsight_publication.pdf551.97 kBAdobe PDF
Show simple item record

SCOPUSTM   
Citations 5

55
Last Week
0
Last month
1
checked on Sep 12, 2020

Page view(s) 50

1,796
Last Week
2
Last month
14
checked on May 16, 2022

Download(s) 50

377
checked on May 16, 2022

Google ScholarTM

Check

Altmetric


If you are a publisher or author and have copyright concerns for any item, please email research.repository@ucd.ie and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.