Synthetic Dataset Generation for Online Topic Modeling

DC FieldValueLanguage
dc.contributor.authorBelford, Mark-
dc.contributor.authorMacNamee, Brian-
dc.contributor.authorGreene, Derek-
dc.date.accessioned2019-07-03T07:45:47Z-
dc.date.available2019-07-03T07:45:47Z-
dc.date.copyright2017 the Authoren_US
dc.date.issued2018-04-12-
dc.identifier.urihttp://hdl.handle.net/10197/10845-
dc.descriptionAICS 2017: 25th Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, 7 - 8 December 2017en_US
dc.description.abstractOnline topic modeling allows for the discovery of the underlying latent structure in a real time stream of data. In the evaluation of such approaches it is common that a static value for the number of topics is chosen. However, we would expect the number of topics to vary over time due to changes in the underlying structure of the data, known as concept drift and concept shift. We propose a semi-synthetic dataset generator, which can introduce concept drift and concept shift into existing annotated non-temporal datasets, via user-controlled paramaterization. This allows for the creation of multiple different artificial streams of data, where the “correct” number and composition of the topics is known at each point in time. We demonstrate how these generated datasets can be used as an evaluation strategy for online topic modeling approaches.en_US
dc.description.sponsorshipScience Foundation Irelanden_US
dc.language.isoenen_US
dc.publisherCEUR-WS.orgen_US
dc.relation.ispartofMcAuley, J., McKeever, S. (eds.). Proceedings of the 25th Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, December 7 - 8, 2017en_US
dc.subjectMachine Learning & Statisticsen_US
dc.subjectOnline topic modelingen_US
dc.subjectSemi-synthetic dataset generatoren_US
dc.subjectParamaterizationen_US
dc.titleSynthetic Dataset Generation for Online Topic Modelingen_US
dc.typeConference Publicationen_US
dc.internal.webversionshttps://dblp.org/db/conf/aics/aics2017-
dc.statusPeer revieweden_US
dc.identifier.startpage63en_US
dc.identifier.endpage75en_US
dc.check.date2020-01-05-
dc.neeo.contributorBelford|Mark|aut|-
dc.neeo.contributorMacNamee|Brian|aut|-
dc.neeo.contributorGreene|Derek|aut|-
dc.description.othersponsorshipInsight Research Centreen_US
dc.date.updated2019-07-02T13:41:32Z-
dc.identifier.grantidSFI/12/RC/2289-
item.fulltextWith Fulltext-
item.grantfulltextopen-
Appears in Collections:Insight Research Collection
Files in This Item:
File Description SizeFormat 
insight_publication.pdf308.82 kBAdobe PDFDownload
Show simple item record

Page view(s)

78
Last Week
5
Last month
checked on Dec 5, 2019

Download(s)

16
checked on Dec 5, 2019

Google ScholarTM

Check


This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.