Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Science
  3. School of Computer Science
  4. Computer Science Research Collection
  5. ANNOTATE: orgANizing uNstructured cOntenTs viA Topic labEls
 
  • Details
Options

ANNOTATE: orgANizing uNstructured cOntenTs viA Topic labEls

Author(s)
Ajwani, Deepak  
Taneva, Bilyana  
Dutta, Sourav  
et al.  
Uri
http://hdl.handle.net/10197/9888
Date Issued
2018-12-13
Date Available
2019-04-10T11:06:51Z
Abstract
With the advent of Big Data paradigm, filtering, retrieval, and linking of unstructured multi-modal data has become a necessity. Assigning topic labels to contents, that accurately capture the meaning and contextual information, is a fundamental problem in organizing unstructured data. The usage of manually-assigned tags for this purpose introduces inconsistencies because of different »surface forms». On the other hand, existing automated approaches either use hierarchical multi-label classification, or are unsupervised and rely on (undirected) graph measures leveraging taxonomies. While the former requires large training data set to learn the characteristics of each topic class, the latter lacks the flexibility to learn broad range of related topics and are less accurate. We propose a novel framework, ANNOTATE based on a small set of features and directed traversal of taxonomies to learn a broad spectrum of related topics using limited training data. We also show that our approach provides accurate labels for several domains without the need for re-training. For instance, the framework, trained on a small set of BBC news articles, exhibits close matches to user-generated tags for Quora documents. Experimental results, on the same model, for news classification and identifying aspects of Amazon product reviews, based on Amazon Mechanical Turk evaluation show our approach to be significantly better than state-of-the-art. We further present real-life case studies of our proposed framework for automatically tagging Quora posts, and topically segmenting, indexing and linking related YouTube videos (using our publicly available Chrome browser extension).
Type of Material
Conference Publication
Publisher
IEEE
Start Page
1699
End Page
1708
Copyright (Published Version)
2018 IEEE
Subjects

Taxonomy

Labeling

Semantics

Encyclopedias

Electronic publishing...

Internet

DOI
10.1109/BigData.2018.8622647
Web versions
http://cci.drexel.edu/bigdata/bigdata2018/
Language
English
Status of Item
Not peer reviewed
Journal
Abe, N., Liu, H., Pu, C. et al. (eds.). Proceedings: 2018 IEEE International Conference on Big Data, Dec 10 - Dec 13, 2018, Seattle, WA, USA
Conference Details
2018 IEEE International Conference on Big Data, Seattle, United States of America, 10-13 December 2018
ISBN
9781538650356
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
File(s)
No Thumbnail Available
Name

ajwani_bigdata18.pdf

Size

693.19 KB

Format

Adobe PDF

Checksum (MD5)

27c0a9838e9ac38eab3f4e09a0584d0e

Owning collection
Computer Science Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

For all queries please contact research.repository@ucd.ie.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement