How Many Topics? Stability Analysis for Topic Models

Files in This Item:
File Description SizeFormat 
insight_publication.pdf551.97 kBAdobe PDFDownload
Title: How Many Topics? Stability Analysis for Topic Models
Authors: Greene, DerekO'Callaghan, DerekCunningham, Pádraig
Permanent link:
Date: 19-Sep-2014
Online since: 2015-06-21T12:24:57Z
Abstract: Topic modeling refers to the task of discovering the underlyingthematic structure in a text corpus, where the output is commonlypresented as a report of the top terms appearing in each topic. Despitethe diversity of topic modeling algorithms that have been proposed, acommon challenge in successfully applying these techniques is the selectionof an appropriate number of topics for a given corpus. Choosingtoo few topics will produce results that are overly broad, while choosingtoo many will result in theover-clustering of a corpus into many small,highly-similar topics. In this paper, we propose a term-centric stabilityanalysis strategy to address this issue, the idea being that a model withan appropriate number of topics will be more robust to perturbations inthe data. Using a topic modeling approach based on matrix factorization,evaluations performed on a range of corpora show that this strategy cansuccessfully guide the model selection process.
metadata.dc.description.othersponsorship: Science Foundation Ireland
Type of material: Conference Publication
Journal: Machine Learning and Knowledge Discovery in Databases. Proceedings Part I.
Start page: 498
End page: 513
Copyright (published version): 2014 Springer
Keywords: StatisticsMachine learningLatent Dirichlet Allocation (LDA)Non-negative Matrix Factorization (NMF)Topic modelingCorpora
DOI: 10.1007/978-3-662-44848-9_32
Other versions:
Language: en
Status of Item: Peer reviewed
Conference Details: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML '14), 15-19 September, Nancy, France
Appears in Collections:Computer Science Research Collection
Insight Research Collection

Show full item record

Citations 5

Last Week
Last month
checked on Sep 12, 2020

Page view(s) 50

Last Week
Last month
checked on Oct 31, 2020

Download(s) 50

checked on Oct 31, 2020

Google ScholarTM



This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.