Options
An Analysis of the Coherence of Descriptors in Topic Modeling
Date Issued
2015-08-01
Date Available
2015-04-14T11:24:54Z
Abstract
In recent years, topic modeling has become an established method in the analysis of text corpora, with probabilistic techniques such as latent Dirichlet allocation (LDA) commonly employed for this purpose. However, it might be argued that adequate attention is often not paid to the issue of topic coherence, the semantic interpretability of the top terms usually used to describe discovered topics. Nevertheless, a number of studies have proposed measures for analyzing such coherence, where these have been largely focused on topics found by LDA, with matrix decomposition techniques such as Non-negative Matrix Factorization (NMF) being somewhat overlooked in comparison. This motivates the current work, where we compare and analyze topics found by popular variants of both NMF and LDA in multiple corpora in terms of both their coherence and associated generality, using a combination of existing and new measures, including one based on distributional semantics. Two out of three coherence measures find NMF to regularly produce more coherent topics, with higher levels of generality and redundancy observed with the LDA topic descriptors. In all cases, we observe that the associated term weighting strategy plays a major role. The results observed with NMF suggest that this may be a more suitable topic modeling method when analyzing certain corpora, such as those associated with niche or non-mainstream domains.
Other Sponsorship
Science Foundation Ireland
Type of Material
Journal Article
Publisher
Elsevier
Journal
Expert Systems with Applications
Volume
42
Issue
13
Start Page
5645
End Page
5657
Copyright (Published Version)
2015 Elsevier
Web versions
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Owning collection
Scopus© citations
214
Acquisition Date
Mar 28, 2024
Mar 28, 2024
Views
2647
Last Month
1
1
Acquisition Date
Mar 28, 2024
Mar 28, 2024
Downloads
2842
Last Week
11
11
Last Month
50
50
Acquisition Date
Mar 28, 2024
Mar 28, 2024