Options
Ensemble Topic Modeling via Matrix Factorization
Author(s)
Date Issued
2016-09-21
Date Available
2017-02-13T15:06:25Z
Abstract
Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents, facilitating knowledge discovery and information summarization. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, these methods tend to have stochastic elements in their initialization, which can lead to their output being unstable. That is, if a topic modeling algorithm is applied to the same data multiple times, the output will not necessarily always be the same. With this idea of stability in mind we ask the question – how can we produce a definitive topic model that is both stable and accurate? To address this, we propose a new ensemble topic modeling method, based on Non-negative Matrix Factorization (NMF), which combines a collection of unstable topic models to produce a definitive output. We evaluate this method on an annotated tweet corpus, where we show that this new approach is more accurate and stable than traditional NMF.
Sponsorship
Science Foundation Ireland
Type of Material
Conference Publication
Publisher
CEUR Workshop Proceedings
Volume
1751
Copyright (Published Version)
2016 the Authors
Subjects
Web versions
Language
English
Status of Item
Peer reviewed
Journal
Greene, D., Mac Namee, B. and Ross, R. (eds.). Proceedings of 24th Irish Conference on Artificial Intelligence and Cognitive Science (AICS'16)
Conference Details
24th Irish Conference on Artificial Intelligence and Cognitive Science (AICS'16), Dublin, Ireland, 20-21 September 2016
This item is made available under a Creative Commons License
File(s)
Loading...
Name
insight_publication.pdf
Size
247.25 KB
Format
Adobe PDF
Checksum (MD5)
ed054949aef6df1055daf5c7c5e5c962
Owning collection