An Analysis of the Coherence of Descriptors in Topic Modeling

Files in This Item:
File Description SizeFormat 
insight_publication.pdf477.98 kBAdobe PDFDownload
Title: An Analysis of the Coherence of Descriptors in Topic Modeling
Authors: O'Callaghan, Derek
Greene, Derek
Carthy, Joe
Cunningham, Pádraig
Permanent link: http://hdl.handle.net/10197/6482
Date: 1-Aug-2015
Abstract: In recent years, topic modeling has become an established method in the analysis of text corpora, with probabilistic techniques such as latent Dirichlet allocation (LDA) commonly employed for this purpose. However, it might be argued that adequate attention is often not paid to the issue of topic coherence, the semantic interpretability of the top terms usually used to describe discovered topics. Nevertheless, a number of studies have proposed measures for analyzing such coherence, where these have been largely focused on topics found by LDA, with matrix decomposition techniques such as Non-negative Matrix Factorization (NMF) being somewhat overlooked in comparison. This motivates the current work, where we compare and analyze topics found by popular variants of both NMF and LDA in multiple corpora in terms of both their coherence and associated generality, using a combination of existing and new measures, including one based on distributional semantics. Two out of three coherence measures find NMF to regularly produce more coherent topics, with higher levels of generality and redundancy observed with the LDA topic descriptors. In all cases, we observe that the associated term weighting strategy plays a major role. The results observed with NMF suggest that this may be a more suitable topic modeling method when analyzing certain corpora, such as those associated with niche or non-mainstream domains.
Type of material: Journal Article
Publisher: Elsevier
Copyright (published version): 2015 Elsevier
Keywords: Machine learning;Statistics;Topic modeling;Topic coherence;LDA;NMF
DOI: 10.1016/j.eswa.2015.02.055
Language: en
Status of Item: Peer reviewed
Appears in Collections:Computer Science Research Collection
Insight Research Collection

Show full item record

SCOPUSTM   
Citations 10

23
Last Week
2
Last month
checked on Jun 22, 2018

Google ScholarTM

Check

Altmetric


This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.