Finding Niche Topics using Semi-Supervised Topic Modeling via Word Embeddings

Files in This Item:
File Description SizeFormat 
insight_publication.pdf1.07 MBAdobe PDFDownload
Title: Finding Niche Topics using Semi-Supervised Topic Modeling via Word Embeddings
Authors: Conheady, GeraldGreene, Derek
Permanent link: http://hdl.handle.net/10197/10853
Date: 31-Jul-2017
Online since: 2019-07-08T08:57:26Z
Abstract: Topic modeling techniques generally focus on the discovery of the predominant thematic structures in text corpora. In contrast, a niche topic is made up of a small number of documents related to a common theme. Such a topic may have so few documents relative to the overall corpus size that it fails to be identified when using standard techniques. This paper proposes a new process, called Niche+, for finding these kinds of niche topics. It assumes interactions with a user who can provide a strictly limited level of supervision, which is subsequently employed in semi-supervised matrix factorization. Furthermore, word embeddings are used to provide additional weakly-labeled data. Experimental results show that documents in niche topics can be successfully identified using Niche+. These results are further supported via a use case that explores a real-world company email database.
Funding Details: Science Foundation Ireland
Type of material: Conference Publication
Publisher: CEUR-WS.org
Start page: 36
End page: 48
Keywords: Modeling techniquesNiche+Word embeddingsText corpus explorationTopic modeling
Other versions: https://dblp.org/db/conf/aics/aics2017
Language: en
Status of Item: Peer reviewed
Is part of: McAuley, J., McKeever, S. (eds.). Proceedings of the 25th Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, December 7 - 8, 2017. CEUR Workshop Proceedings 2086, CEUR-WS.org 2018
Conference Details: AICS 2017: 25th Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, 7-8 December 2017
Appears in Collections:Insight Research Collection

Show full item record

Google ScholarTM

Check


This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.