How Many Topics? Stability Analysis for Topic Models
|Title:||How Many Topics? Stability Analysis for Topic Models||Authors:||Greene, Derek; O'Callaghan, Derek; Cunningham, Pádraig||Permanent link:||http://hdl.handle.net/10197/6617||Date:||19-Sep-2014||Online since:||2015-06-21T12:24:57Z||Abstract:||Topic modeling refers to the task of discovering the underlyingthematic structure in a text corpus, where the output is commonlypresented as a report of the top terms appearing in each topic. Despitethe diversity of topic modeling algorithms that have been proposed, acommon challenge in successfully applying these techniques is the selectionof an appropriate number of topics for a given corpus. Choosingtoo few topics will produce results that are overly broad, while choosingtoo many will result in theover-clustering of a corpus into many small,highly-similar topics. In this paper, we propose a term-centric stabilityanalysis strategy to address this issue, the idea being that a model withan appropriate number of topics will be more robust to perturbations inthe data. Using a topic modeling approach based on matrix factorization,evaluations performed on a range of corpora show that this strategy cansuccessfully guide the model selection process.||Funding Details:||Science Foundation Ireland||Type of material:||Conference Publication||Journal:||Machine Learning and Knowledge Discovery in Databases. Proceedings Part I.||Start page:||498||End page:||513||Copyright (published version):||2014 Springer||Keywords:||Statistics; Machine learning; Latent Dirichlet Allocation (LDA); Non-negative Matrix Factorization (NMF); Topic modeling; Corpora||DOI:||10.1007/978-3-662-44848-9_32||Other versions:||http://www.ecmlpkdd2014.org/||Language:||en||Status of Item:||Peer reviewed||Conference Details:||European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML '14), 15-19 September, Nancy, France||This item is made available under a Creative Commons License:||https://creativecommons.org/licenses/by-nc-nd/3.0/ie/|
|Appears in Collections:||Computer Science Research Collection|
Insight Research Collection
Show full item record
Page view(s) 501,818
If you are a publisher or author and have copyright concerns for any item, please email email@example.com and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.