Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Institutes and Centres
  3. Insight Centre for Data Analytics
  4. Insight Research Collection
  5. An Analysis of the Coherence of Descriptors in Topic Modeling
 
  • Details
Options

An Analysis of the Coherence of Descriptors in Topic Modeling

Author(s)
O'Callaghan, Derek  
Greene, Derek  
Carthy, Joe  
Cunningham, Pádraig  
Uri
http://hdl.handle.net/10197/6482
Date Issued
2015-08-01
Date Available
2015-04-14T11:24:54Z
Abstract
In recent years, topic modeling has become an established method in the analysis of text corpora, with probabilistic techniques such as latent Dirichlet allocation (LDA) commonly employed for this purpose. However, it might be argued that adequate attention is often not paid to the issue of topic coherence, the semantic interpretability of the top terms usually used to describe discovered topics. Nevertheless, a number of studies have proposed measures for analyzing such coherence, where these have been largely focused on topics found by LDA, with matrix decomposition techniques such as Non-negative Matrix Factorization (NMF) being somewhat overlooked in comparison. This motivates the current work, where we compare and analyze topics found by popular variants of both NMF and LDA in multiple corpora in terms of both their coherence and associated generality, using a combination of existing and new measures, including one based on distributional semantics. Two out of three coherence measures find NMF to regularly produce more coherent topics, with higher levels of generality and redundancy observed with the LDA topic descriptors. In all cases, we observe that the associated term weighting strategy plays a major role. The results observed with NMF suggest that this may be a more suitable topic modeling method when analyzing certain corpora, such as those associated with niche or non-mainstream domains.
Other Sponsorship
Science Foundation Ireland
Type of Material
Journal Article
Publisher
Elsevier
Journal
Expert Systems with Applications
Volume
42
Issue
13
Start Page
5645
End Page
5657
Copyright (Published Version)
2015 Elsevier
Subjects

Machine learning

Statistics

Topic modeling

Topic coherence

LDA

NMF

DOI
10.1016/j.eswa.2015.02.055
Web versions
https://www.insight-centre.org/UCD%20Repository
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
File(s)
Loading...
Thumbnail Image
Name

insight_publication.pdf

Size

477.98 KB

Format

Adobe PDF

Checksum (MD5)

0835222441a45a7b2dfad72845f711a2

Owning collection
Insight Research Collection
Mapped collections
Computer Science Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

For all queries please contact research.repository@ucd.ie.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement