Now showing 1 - 2 of 2
- PublicationLinguistically Informed Tweet Categorization for Online Reputation ManagementDetermining relevant content automatically is a challenging task for any aggregation system. In the business intelligence domain, particularly in the application area of Online Reputation Management, it may be desirable to label tweets as either customer comments which deserve rapid attention or tweets from industry experts or sources regarding the higher-level operations of a particular entity. We present an approach using a combination of linguistic and Twitter-specific features to represent tweets and examine the efficacy of these in distinguishing between tweets which have been labelled using Amazon’s Mechanical Turk crowd sourcing platform. Features such as part of-speech tags and function words provehighly effective at discriminating between the two categories of tweet related to several distinct entity types, with Twitter related metrics such as the presence of hash tags, retweets and user mentions also adding to classification accuracy. Accuracy of 86% is reported using an SVM classifier and a mixed set of the aforementioned features on a corpus of tweets related to seven business entities.
- PublicationUCD : Diachronic Text Classification with Character, Word, and Syntactic N-gramsWe present our submission to SemEval-2015Task 7: Diachronic Text Evaluation, in whichwe approach the task of assigning a date toa text as a multi-class classification problem.We extract n-gram features from the text atthe letter, word, and syntactic level, and usethese to train a classifier on date-labeled trainingdata. We also incorporate date probabilitiesof syntactic features as estimated from avery large external corpus of books. Our systemachieved the highest performance of allsystems on subtask 2: identifying texts by specifictime language use.