Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Institutes and Centres
  3. Insight Centre for Data Analytics
  4. Insight Research Collection
  5. How Many Topics? Stability Analysis for Topic Models
 
  • Details
Options

How Many Topics? Stability Analysis for Topic Models

Author(s)
Greene, Derek  
O'Callaghan, Derek  
Cunningham, Pádraig  
Uri
http://hdl.handle.net/10197/6617
Date Issued
2014-09-19
Date Available
2015-06-21T12:24:57Z
Abstract
Topic modeling refers to the task of discovering the underlyingthematic structure in a text corpus, where the output is commonlypresented as a report of the top terms appearing in each topic. Despitethe diversity of topic modeling algorithms that have been proposed, acommon challenge in successfully applying these techniques is the selectionof an appropriate number of topics for a given corpus. Choosingtoo few topics will produce results that are overly broad, while choosingtoo many will result in theover-clustering of a corpus into many small,highly-similar topics. In this paper, we propose a term-centric stabilityanalysis strategy to address this issue, the idea being that a model withan appropriate number of topics will be more robust to perturbations inthe data. Using a topic modeling approach based on matrix factorization,evaluations performed on a range of corpora show that this strategy cansuccessfully guide the model selection process.
Other Sponsorship
Science Foundation Ireland
Type of Material
Conference Publication
Journal
Machine Learning and Knowledge Discovery in Databases. Proceedings Part I.
Start Page
498
End Page
513
Copyright (Published Version)
2014 Springer
Subjects

Statistics

Machine learning

Latent Dirichlet Allo...

Non-negative Matrix F...

Topic modeling

Corpora

DOI
10.1007/978-3-662-44848-9_32
Web versions
http://www.ecmlpkdd2014.org/
Language
English
Status of Item
Peer reviewed
Conference Details
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML '14), 15-19 September, Nancy, France
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
File(s)
Loading...
Thumbnail Image
Name

insight_publication.pdf

Size

551.97 KB

Format

Adobe PDF

Checksum (MD5)

9035903f010ddcfe6e07106d2614209d

Owning collection
Insight Research Collection
Mapped collections
Computer Science Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

For all queries please contact research.repository@ucd.ie.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement