Now showing 1 - 6 of 6
  • Publication
    Score Normalization and Aggregation for Active Learning in Multi-label Classification
    (University College Dublin. School of Computer Science and Informatics, 2010-02) ; ; ;
    Active learning is useful in situations where labeled data is scarce, unlabeled data is available, and labeling a large number of examples is costly or impractical. These techniques help by identifying a minimal set of examples to label that will support the training of an effective classifier. Thus active learning is particularly relevant for the automation of annotation tasks in multimedia. In this paper we consider the problem of employing active learning for the assignment of multiple annotations or “tags” to images in personal image collections. This form of multi-label classification has received a lot of attention in recent years, however active multi-label classification is still a new research area. The main challenge in active multilabel classification is the selection of unlabeled examples that will be informative for all tags under consideration. This selection task proves surprisingly difficult primarily because of the paucity of labeled data available. In this paper we present some solutions to this problem based on aggregated rankings from classifiers for individual tags.
  • Publication
    Using crowdsourcing and active learning to track sentiment in online media
    Tracking sentiment in the popular media has long been of interest to media analysts and pundits. With the availability of news content via online syndicated feeds, it is now possible to automate some aspects of this process. There is also great potential to crowdsource much of the annotation work that is required to train a machine learning system to perform sentiment scoring. We describe such a system for tracking economic sentiment in online media that has been deployed since August 2009. It uses annotations provided by a cohort of non-expert annotators to train a learning system to classify a large body of news items. We report on the design challenges addressed in managing the effort of the annotators and in making annotation an interesting experience.
      2921Scopus© Citations 67
  • Publication
    Deriving insights from national happiness indices
    In online social media, individuals produce vast amounts of content which in effect "instruments" the world around us. Users on sites such as Twitter are publicly broadcasting status updates that provide an indication of their mood at a given moment in time, often accompanied by geolocation information. A number of strategies exist to aggregate such content to produce sentiment scores in order to build a "happiness index". In this paper, we describe such a system based on Twitter that maintains a happiness index for nine US cities. The main contribution of this paper is a companion system called SentireCrowds that allows us to identify the underlying causes behind shifts in sentiment. This ability to analyse the components of the sentiment signal highlights a number of problems. It shows that sentiment scoring on social media data without considering context is difficult. More importantly, it highlights cases where sentiment scoring methods are susceptible to unexpected shifts due to noise and trending memes.
      1373Scopus© Citations 14
  • Publication
    A latent space mapping for link prediction
    Network modeling can be approached using either discriminative or probabilistic models. In the task of link prediction a probabilistic model will give a probability for the existence of a link; while in some scenarios this may be beneficial, in others a hard discriminative boundary needs to be set. Hence the use of a discriminative classifier is preferable. In domains such as image analysis and speaker recognition, probabilistic models have been used as a mechanism from which features can be extracted. This paper examines using a probabilistic model built on the entire graph to extract features to predict the existence of unknown links between two nodes. It demonstrates how features extracted from the model as well as the predicted probability of a link existing can aid the classification process.
  • Publication
    An Evaluation of One-Class Classification Techniques for Speaker Verification
    (University College Dublin. School of Computer Science and Informatics, 2007-08-13) ; ;
    Speaker verification is a challenging problem in speaker recognition where the objective is to determine whether a segment of speech in fact comes from a specific individual. In supervised machine learning terms this is a challenging problem as, while examples belonging to the target class are easy to gather, the set of counterexamples is completely open. In this paper we cast this as a one-class classification problem and evaluate a variety of state-of-the-art one-class classification techniques on a benchmark speech recognition dataset. We show that of the one-class classification techniques, Gaussian Mixture Models shows the best performance on this task.
  • Publication
    Taking the pulse of the web : assessing sentiment on topics in online media
    The task of identifying sentiment trends in the popular media has long been of interest to analysts and pundits. Until recently, this task has required professional annotators to manually inspect individual articles in order to identify their polarity. With the increased availability of large volumes of online news content via syndicated feeds, researchers have begun to examine ways to automate aspects of this process. In this work, we describe a sentiment analysis system that uses crowdsourcing to gather non-expert annotations for economic news articles. By using these annotations in conjunction with a supervised machine learning strategy, we can generalize to label a much larger set of articles, allowing us to effectively track sentiment in different news sources over time.