Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Science
  3. School of Computer Science
  4. Computer Science Research Collection
  5. Classification for Crisis-Related Tweets Leveraging Word Embeddings and Data Augmentation
 
  • Details
Options

Classification for Crisis-Related Tweets Leveraging Word Embeddings and Data Augmentation

Author(s)
Wang, Congcong  
Lillis, David  
Uri
http://hdl.handle.net/10197/25817
Date Issued
2019-11-15
Date Available
2024-05-02T15:58:33Z
Abstract
This paper presents University College Dublin’s (UCD) work at TREC 2019-B Incident Streams (IS) track. The purpose of the IS track is to find actionable messages and estimate their priority among a stream of crisis-related tweets. Based on the track’s requirements, we break down the task into two sub-tasks. One is defined as a multi-label classification task that categorises upcoming tweets into different aid requests. The other is defined as a single-label classification task that estimates these tweets with four different levels of priority. For the track, we submitted four runs, each of which uses a different model for the tasks. Our baseline run trains classification models with hand-crafted features through machine learning methods, namely Logistic Regression and Naïve Bayes. Our other three runs train classification models with different deep learning methods. The deep methods include a vanilla bidirectional long short-term memory recurrent neural network (biLSTM), an adapted biLSTM, and a bi-attentive classification network (BCN) with pre-trained contextualised ELMo embedding. For all the runs, we apply different word embeddings (in-domain pre-trained, word-level pre-trained GloVe, character-level, or ELMo embeddings) and data augmentation strategies (SMOTE, loss weights, or GPT-2) to explore the influence they have on performance. Evaluation results show that our models perform better than the median for most situations.
Type of Material
Conference Publication
Subjects

Emergency response

Text classification

Deep learning

Web versions
https://trec.nist.gov/pubs/trec28/trec2019.html
Language
English
Status of Item
Peer reviewed
Journal
Proceedings of the Twenty-Eighth Text REtrieval Conference (TREC 2019)
Conference Details
The Twenty-Eighth Text REtrieval Conference (TREC 2019), Gaithersburg, Maryland, United States of America, 13-15 November 2019
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
File(s)
Loading...
Thumbnail Image
Name

Wang2020(3).pdf

Size

936.78 KB

Format

Adobe PDF

Checksum (MD5)

b5ea1e3c7219a782cc90ebb1f8700ca6

Owning collection
Computer Science Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

For all queries please contact research.repository@ucd.ie.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement