Options
How Many Topics? Stability Analysis for Topic Models
File(s)
File | Description | Size | Format | |
---|---|---|---|---|
insight_publication.pdf | 551.97 KB |
Date Issued
19 September 2014
Date Available
21T12:24:57Z June 2015
Abstract
Topic modeling refers to the task of discovering the underlyingthematic structure in a text corpus, where the output is commonlypresented as a report of the top terms appearing in each topic. Despitethe diversity of topic modeling algorithms that have been proposed, acommon challenge in successfully applying these techniques is the selectionof an appropriate number of topics for a given corpus. Choosingtoo few topics will produce results that are overly broad, while choosingtoo many will result in theover-clustering of a corpus into many small,highly-similar topics. In this paper, we propose a term-centric stabilityanalysis strategy to address this issue, the idea being that a model withan appropriate number of topics will be more robust to perturbations inthe data. Using a topic modeling approach based on matrix factorization,evaluations performed on a range of corpora show that this strategy cansuccessfully guide the model selection process.
Other Sponsorship
Science Foundation Ireland
Type of Material
Conference Publication
Journal
Machine Learning and Knowledge Discovery in Databases. Proceedings Part I.
Start Page
498
End Page
513
Copyright (Published Version)
2014 Springer
Web versions
Language
English
Status of Item
Peer reviewed
Description
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML '14), 15-19 September, Nancy, France
This item is made available under a Creative Commons License
Owning collection
Scopus© citations
106
Acquisition Date
Jan 31, 2023
Jan 31, 2023
Views
1896
Acquisition Date
Feb 1, 2023
Feb 1, 2023
Downloads
421
Last Week
5
5
Last Month
200
200
Acquisition Date
Feb 1, 2023
Feb 1, 2023