Options
Classification for Crisis-Related Tweets Leveraging Word Embeddings and Data Augmentation
Author(s)
Date Issued
2019-11-15
Date Available
2024-05-02T15:58:33Z
Abstract
This paper presents University College Dublin’s (UCD) work at TREC 2019-B Incident Streams (IS) track. The purpose of the IS track is to find actionable messages and estimate their priority among a stream of crisis-related tweets. Based on the track’s requirements, we break down the task into two sub-tasks. One is defined as a multi-label classification task that categorises upcoming tweets into different aid requests. The other is defined as a single-label classification task that estimates these tweets with four different levels of priority. For the track, we submitted four runs, each of which uses a different model for the tasks. Our baseline run trains classification models with hand-crafted features through machine learning methods, namely Logistic Regression and Naïve Bayes. Our other three runs train classification models with different deep learning methods. The deep methods include a vanilla bidirectional long short-term memory recurrent neural network (biLSTM), an adapted biLSTM, and a bi-attentive classification network (BCN) with pre-trained contextualised ELMo embedding. For all the runs, we apply different word embeddings (in-domain pre-trained, word-level pre-trained GloVe, character-level, or ELMo embeddings) and data augmentation strategies (SMOTE, loss weights, or GPT-2) to explore the influence they have on performance. Evaluation results show that our models perform better than the median for most situations.
Type of Material
Conference Publication
Web versions
Language
English
Status of Item
Peer reviewed
Journal
Proceedings of the Twenty-Eighth Text REtrieval Conference (TREC 2019)
Conference Details
The Twenty-Eighth Text REtrieval Conference (TREC 2019), Gaithersburg, Maryland, United States of America, 13-15 November 2019
This item is made available under a Creative Commons License
File(s)
Loading...
Name
Wang2020(3).pdf
Size
936.78 KB
Format
Adobe PDF
Checksum (MD5)
b5ea1e3c7219a782cc90ebb1f8700ca6
Owning collection