Sentence-Level Event Classification in Unstructured Texts

Files in This Item:
 File SizeFormat
Downloaducd-csi-2008-6.pdf270.43 kBAdobe PDF
Title: Sentence-Level Event Classification in Unstructured Texts
Authors: Naughton, MartinaStokes, NicolaCarthy, Joe
Permanent link: http://hdl.handle.net/10197/12371
Date: Sep-2008
Online since: 2021-07-30T16:12:38Z
Abstract: The ability to correctly classify sentences that describe events is an important task for many natural language applications such as Question Answering (QA) and Text Summarisation. In this paper, we treat event detection as a sentence level text classification problem. We compare the performance of two approaches to this task: a Support Vector Machine (SVM) classifier and a Language Modeling (LM) approach. We also investigate a rule-based method that uses hand-crafted lists of ‘trigger’ terms derived from WordNet. We use two datasets in our experiments and test each approach using six different event types, i.e, Die, Attack, Injure, Meet, Transport and Charge-Indict. Our experimental results indicate that although the trained SVM classifier consistently outperforms the language modeling approach, our rule-based system marginally outperforms the trained SVM classifier on three of our six event types. We also observe that overall performance is greatly affected by the type of corpus used to train the algorithms. Specifically, we have found that a homogeneous training corpus that contains many instances of a specific event type (i.e., Die events in the recent Iraqi war) produces a poorer performing classifier than one trained on a heterogeneous dataset containing more diverse instances of the event (i.e.,Die events in many different settings, for example, traffic accidents, natural disasters etc.). Our heterogeneous dataset is provided by the ACE (Automatic Content Extraction) initiative, while our novel homogeneous dataset consists of news articles and annotated Die events from the Iraq Body Count (IBC) database. Overall, our results show that the techniques presented here are effective solutions to the event classification task described in this paper, where F1 scores of over 90% are achieved.
Item notes: Technical report numbers ucd-csi-2008-06 and ucd-csi-2008-07 are identical; only one copy has been retained.
Funding Details: Irish Research Council for Science, Engineering and Technology
Type of material: Technical Report
Publisher: University College Dublin. School of Computer Science and Informatics
Series/Report no.: UCD CSI Technical Reports; ucd-csi-2008-6; UCD CSI Technical Reports; ucd-csi-2008-7
Copyright (published version): 2008 the Authors
Keywords: Event classificationNatural language processingStatistical methodsText corporaMachine learningError analysis
Other versions: https://web.archive.org/web/20080226040105/http:/csiweb.ucd.ie/Research/TechnicalReports.html
Language: en
Status of Item: Not peer reviewed
This item is made available under a Creative Commons License: https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
Appears in Collections:Computer Science and Informatics Technical Reports

Show full item record

Page view(s)

52
Last Week
7
Last month
30
checked on Sep 20, 2021

Download(s)

17
checked on Sep 20, 2021

Google ScholarTM

Check


If you are a publisher or author and have copyright concerns for any item, please email research.repository@ucd.ie and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.