Ensemble Topic Modeling via Matrix Factorization

Belford, Mark; MacNamee, Brian; Greene, Derek

Ensemble Topic Modeling via Matrix Factorization

Author(s)

Belford, Mark

MacNamee, Brian

Greene, Derek

Uri

http://hdl.handle.net/10197/8336

Date Issued

2016-09-21

Date Available

2017-02-13T15:06:25Z

Abstract

Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents, facilitating knowledge discovery and information summarization. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, these methods tend to have stochastic elements in their initialization, which can lead to their output being unstable. That is, if a topic modeling algorithm is applied to the same data multiple times, the output will not necessarily always be the same. With this idea of stability in mind we ask the question – how can we produce a definitive topic model that is both stable and accurate? To address this, we propose a new ensemble topic modeling method, based on Non-negative Matrix Factorization (NMF), which combines a collection of unstable topic models to produce a definitive output. We evaluate this method on an annotated tweet corpus, where we show that this new approach is more accurate and stable than traditional NMF.

Sponsorship

Science Foundation Ireland

Type of Material

Conference Publication

Publisher

CEUR Workshop Proceedings

Volume

1751

Copyright (Published Version)

2016 the Authors

Subjects

Machine learning

Statistics

Web versions

http://ceur-ws.org/Vol-1751/

Language

English

Status of Item

Peer reviewed

Journal

Greene, D., Mac Namee, B. and Ross, R. (eds.). Proceedings of 24th Irish Conference on Artificial Intelligence and Cognitive Science (AICS'16)

Conference Details

24th Irish Conference on Artificial Intelligence and Cognitive Science (AICS'16), Dublin, Ireland, 20-21 September 2016

This item is made available under a Creative Commons License

https://creativecommons.org/licenses/by-nc-nd/3.0/ie/

Name

insight_publication.pdf

Size

247.25 KB

Format

Adobe PDF

Checksum (MD5)

ed054949aef6df1055daf5c7c5e5c962

Owning collection

Insight Research Collection

Options

Ensemble Topic Modeling via Matrix Factorization