How Many Topics? Stability Analysis for Topic Models
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Greene, Derek | |
dc.contributor.author | O'Callaghan, Derek | |
dc.contributor.author | Cunningham, Pádraig | |
dc.date.accessioned | 2015-06-21T12:24:57Z | |
dc.date.available | 2015-06-21T12:24:57Z | |
dc.date.copyright | 2014 Springer | |
dc.date.issued | 2014-09-19 | |
dc.identifier.citation | Machine Learning and Knowledge Discovery in Databases. Proceedings Part I. | |
dc.identifier.uri | http://hdl.handle.net/10197/6617 | |
dc.description | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML '14), 15-19 September, Nancy, France | en |
dc.description.abstract | Topic modeling refers to the task of discovering the underlyingthematic structure in a text corpus, where the output is commonlypresented as a report of the top terms appearing in each topic. Despitethe diversity of topic modeling algorithms that have been proposed, acommon challenge in successfully applying these techniques is the selectionof an appropriate number of topics for a given corpus. Choosingtoo few topics will produce results that are overly broad, while choosingtoo many will result in theover-clustering of a corpus into many small,highly-similar topics. In this paper, we propose a term-centric stabilityanalysis strategy to address this issue, the idea being that a model withan appropriate number of topics will be more robust to perturbations inthe data. Using a topic modeling approach based on matrix factorization,evaluations performed on a range of corpora show that this strategy cansuccessfully guide the model selection process. | en |
dc.language.iso | en | en |
dc.rights | The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-662-44848-9_32 | |
dc.subject | Statistics | en |
dc.subject | Machine learning | en |
dc.subject | Latent Dirichlet Allocation (LDA) | en |
dc.subject | Non-negative Matrix Factorization (NMF) | en |
dc.subject | Topic modeling | en |
dc.subject | Corpora | en |
dc.title | How Many Topics? Stability Analysis for Topic Models | en |
dc.type | Conference Publication | en |
dc.internal.webversions | http://www.ecmlpkdd2014.org/ | - |
dc.status | Peer reviewed | en |
dc.identifier.startpage | 498 | |
dc.identifier.endpage | 513 | |
dc.identifier.doi | 10.1007/978-3-662-44848-9_32 | - |
dc.neeo.contributor | Greene|Derek|aut| | - |
dc.neeo.contributor | O'Callaghan|Derek|aut| | - |
dc.neeo.contributor | Cunningham|Pádraig|aut| | - |
dc.description.othersponsorship | Science Foundation Ireland | en |
dc.date.updated | 2015-05-27T15:00:52Z | |
dc.rights.license | https://creativecommons.org/licenses/by-nc-nd/3.0/ie/ | en |
item.fulltext | With Fulltext | - |
item.grantfulltext | open | - |
Appears in Collections: | Computer Science Research Collection Insight Research Collection |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Download | insight_publication.pdf | 551.97 kB | Adobe PDF |
SCOPUSTM
Citations
5
55
Last Week
0
0
Last month
1
1
checked on Sep 12, 2020
Page view(s) 50
1,796
Last Week
2
2
Last month
14
14
checked on May 16, 2022
Download(s) 50
377
checked on May 16, 2022
Google ScholarTM
Check
Altmetric
If you are a publisher or author and have copyright concerns for any item, please email research.repository@ucd.ie and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.