Options
Coping with low data availability for social media crisis message categorisation
Author(s)
Date Issued
2023
Date Available
2025-11-14T14:15:57Z
Abstract
During crisis situations, social media allows people to quickly share information, including messages requesting help. This can be valuable to emergency responders, who need to categorise and prioritise these messages based on the type of assistance being requested. However, the high volume of messages makes it difficult to filter and prioritise them without the use of computational techniques. Fully supervised filtering techniques for crisis message categorisation typically require a large amount of annotated training data, but this can be difficult to obtain during an ongoing crisis and is expensive in terms of time and labour to create. This thesis focuses on addressing the challenge of low data availability when categorising crisis messages for emergency response. It first presents domain adaptation as a solution for this problem, which involves learning a categorisation model from annotated data from past crisis events (source domain) and adapting it to categorise messages from an ongoing crisis event (target domain). In many-to-many adaptation, where the model is trained on multiple past events and adapted to multiple ongoing events, a multi-task learning approach is proposed using pre-trained language models. This approach outperforms baselines and an ensemble approach further improves performance. In one-to-one or many-to-one adaptation, this research studies which combination of past events to include in the model to achieve the best adaptation performance for a particular target event. An approach using sequence-to-sequence pre-trained language models is proposed that incorporates event information for crisis message categorisation, and it is found to outperform existing state-of-the-art methods. The study also finds that using past events that are more similar to the target event tends to lead to better adaptation performance, while using dissimilar events does not improve performance. However, crisis domain adaptation is only effective when the categorisation task is the same for both the source and target event and there is sufficient annotated data available from the source event. To address the situation where there very limited labelled data is available relating to the target event, the research presents a self-controlled augmentation approach and an optimised iterative self-controlled augmentation approach to generate additional crisis data for model training. These approaches are able to generate high quality crisis data, leading to better classification performance compared to other methods in the few-shot learning scenario. Additionally, the research presents a method for training a categorisation model in a zero-shot setting, where there is no time to annotate any data for the new event. This involves matching label names with the unlabelled data of the target event and creating a pseudo-labelled dataset with high confidence for model training. The results show that this approach is able to effectively pseudo-label the unlabelled data, resulting in better performance compared to other zero-shot methods. The proposed few-shot and zero-shot approaches are also tested in other domains such as emotion and topic classification, and demonstrate superior generalisation performance compared to baselines in these domains. This thesis contributes to the crisis informatics research by coping with low annotated data availability of emerging events for crisis message categorisation on social media. The approaches presented in the thesis are developed in close association with real-world situations and show top performance in experiments. The approaches have the potential to be used in practice for timely and effective humanitarian aid response.
Type of Material
Doctoral Thesis
Qualification Name
Doctor of Philosophy (Ph.D.)
Publisher
University College Dublin. School of Computer Science
Copyright (Published Version)
2023 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
PhD_Thesis.pdf
Size
9.72 MB
Format
Adobe PDF
Checksum (MD5)
3bb1a97222d1d5a3279828313d0fe6a9
Owning collection