Sentence-Level Event Classification in Unstructured Texts
|Title:||Sentence-Level Event Classification in Unstructured Texts||Authors:||Naughton, Martina; Stokes, Nicola; Carthy, Joe||Permanent link:||http://hdl.handle.net/10197/12371||Date:||Sep-2008||Online since:||2021-07-30T16:12:38Z||Abstract:||The ability to correctly classify sentences that describe events is an important task for many natural language applications such as Question Answering (QA) and Text Summarisation. In this paper, we treat event detection as a sentence level text classification problem. We compare the performance of two approaches to this task: a Support Vector Machine (SVM) classifier and a Language Modeling (LM) approach. We also investigate a rule-based method that uses hand-crafted lists of ‘trigger’ terms derived from WordNet. We use two datasets in our experiments and test each approach using six different event types, i.e, Die, Attack, Injure, Meet, Transport and Charge-Indict. Our experimental results indicate that although the trained SVM classifier consistently outperforms the language modeling approach, our rule-based system marginally outperforms the trained SVM classifier on three of our six event types. We also observe that overall performance is greatly affected by the type of corpus used to train the algorithms. Specifically, we have found that a homogeneous training corpus that contains many instances of a specific event type (i.e., Die events in the recent Iraqi war) produces a poorer performing classifier than one trained on a heterogeneous dataset containing more diverse instances of the event (i.e.,Die events in many different settings, for example, traffic accidents, natural disasters etc.). Our heterogeneous dataset is provided by the ACE (Automatic Content Extraction) initiative, while our novel homogeneous dataset consists of news articles and annotated Die events from the Iraq Body Count (IBC) database. Overall, our results show that the techniques presented here are effective solutions to the event classification task described in this paper, where F1 scores of over 90% are achieved.||Item notes:||Technical report numbers ucd-csi-2008-06 and ucd-csi-2008-07 are identical; only one copy has been retained.||Funding Details:||Irish Research Council for Science, Engineering and Technology||Type of material:||Technical Report||Publisher:||University College Dublin. School of Computer Science and Informatics||Series/Report no.:||UCD CSI Technical Reports; ucd-csi-2008-6; UCD CSI Technical Reports; ucd-csi-2008-7||Copyright (published version):||2008 the Authors||Keywords:||Event classification; Natural language processing; Statistical methods; Text corpora; Machine learning; Error analysis||Other versions:||https://web.archive.org/web/20080226040105/http:/csiweb.ucd.ie/Research/TechnicalReports.html||Language:||en||Status of Item:||Not peer reviewed||This item is made available under a Creative Commons License:||https://creativecommons.org/licenses/by-nc-nd/3.0/ie/|
|Appears in Collections:||Computer Science and Informatics Technical Reports|
Show full item record
If you are a publisher or author and have copyright concerns for any item, please email firstname.lastname@example.org and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.