Now showing 1 - 8 of 8
  • Publication
    Be In The Know: Connecting News Articles to Relevant Twitter Conversations
    In this paper we propose a framework for tracking and automatically connecting news articles to Twitter conversations as captured by Twitter hashtags. For example, such a system could alert journalists about news that get a lot of Twitter reaction, so they can investigate those conversations for new developments in the story, promote their article to a set of interested consumers, or discover general sentiment towards the story. Mapping articles to hashtags is nevertheless challenging, due to different language style of articles versus tweets, the streaming aspect, and user behavior when marking tweet-terms as hashtags. We track the Irish Times RSS-feed and a focused Twitter stream over a two months period, and present a system that assigns hashtags to each article, based on its Twitter echo. We propose a machine learning approach for classifying article hashtag pairs. Our empirical study shows that our system delivers high precision for this task.
  • Publication
    On Supporting Digital Journalism: Case Studies in Co-Designing Journalistic Tools
    Since 2013 researchers at University College Dublin in the Insight Centre for Data Analytics have been involved in a significant research programme in digital journalism, specifically targeting tools and social media guidelines to support the work of journalists. Most of this programme was undertaken in collaboration with The Irish Times. This collaboration involved identifying key problems currently faced by digital journalists, developing tools as solutions to these problems, and then iteratively co-designing these tools with feedback from journalists. This paper reports on our experiences and learnings from this research programme, with a view to informing similar efforts in the future.
  • Publication
    Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering
    Twitter has become as much of a news media as a social network, and much research has turned to analysing its content for tracking real-world events, from politics to sports and natural disasters. This paper describes the techniques we employed for the SNOW Data Challenge 2014, described in [16]. We show that aggressive lettering of tweets based on length and structure, combined with hierarchical clustering of tweets and ranking of the resulting clusters, achieves encouraging results. We present empirical results and discussion for two different Twitter streams focusing on the US presidential elections in 2012 and the recent events about Ukraine, Syria and the Bitcoin, in February 2014.
  • Publication
    Insight4News: Connecting News to Relevant Social Conversations
    We present the Insight4News system that connects news articles to social conversations, as echoed in microblogs such as Twitter. Insight4News tracks feeds from mainstream media, e.g., BBC, Irish Times, and extracts relevant topics that summarize the tweet activity around each article, recommends relevant hashtags, and presents complementary views and statistics on the tweet activity, related news articles, and timeline of the story with regard to Twitter reaction. The user can track their own news article or a topic-focused Twitter stream. While many systems tap on the social knowledge of Twitter to help users stay on top of the information wave, none is available for connecting news to relevant Twitter content on a large scale, in real time, with high precision and recall. Insight4News builds on our award winning Twitter topic detection approach and several machine learning components, to deliver news in a social context.
      351Scopus© Citations 6
  • Publication
    Learning-to-Rank for Real-Time High-Precision Hashtag Recommendation for Streaming News
    We address the problem of real-time recommendation ofstreaming Twitter hashtags to an incoming stream of newsarticles. The technical challenge can be framed as largescale topic classication where the set of topics (i.e., hashtags)is huge and highly dynamic. Our main applicationscome from digital journalism, e.g., for promoting originalcontent to Twitter communities and for social indexing ofnews to enable better retrieval, story tracking and summarisation.In contrast to state-of-the-art methods that focus onmodelling each individual hashtag as a topic, we propose alearning-to-rank approach for modelling hashtag relevance,and present methods to extract time-aware features fromhighly dynamic content. We present the data collection andprocessing pipeline, as well as our methodology for achievinglow latency, high precision recommendations. Our empiricalresults show that our method outperforms the state-of-theart,delivering more than 80% precision. Our techniques areimplemented in a real-time system1, and are currently underuser trial with a big news organisation.
      2250Scopus© Citations 28
  • Publication
    Hashtagger+: Efficient High-Coverage Social Tagging of Streaming News
    News and social media now play a synergistic role and neither domain can be grasped in isolation. On one hand, platformssuch as Twitter have taken a central role in the dissemination and consumption of news. On the other hand, news editors rely on socialmedia for following their audiences attention and for crowd-sourcing news stories. Twitter hashtags function as a key connectionbetween Twitter crowds and the news media, by naturally naming and contextualizing stories, grouping the discussion of news andmarking topic trends. In this work we propose Hashtagger+, an efficient learning-to-rank framework for merging news and socialstreams in real-time, by recommending Twitter hashtags to news articles. We provide an extensive study of different approaches forstreaming hashtag recommendation, and show that pointwise learning-to-rank is more effective than multi-class classification as wellas more complex learning-to-rank approaches. We improve the efficiency and coverage of a state-of-the-art hashtag recommendationmodel by proposing new techniques for data collection and feature computation. In our comprehensive evaluation on real-data weshow that we drastically outperform the accuracy and efficiency of prior methods. Our prototype system delivers recommendations inunder 1 minute, with a Precision@1 of 94% and article coverage of 80%. This is an order of magnitude faster than prior approaches,and brings improvements of 5% in precision and 20% in coverage. By effectively linking the news stream to the social stream via therecommended hashtags, we open the door to solving many challenging problems related to story detection and tracking. To showcasethis potential, we present an application of our recommendations to automated news story tracking via social tags. Ourrecommendation framework is implemented in a real-time Web system available from
      710Scopus© Citations 21
  • Publication
    A Distributed Asynchronous Deep Reinforcement Learning Framework for Recommender Systems
    In this paper we propose DADRL, a distributed, asynchronous reinforcement learning recommender system based on the asynchronous advantage actor-critic model (A3C), which combines ideas from A3C and federated learning (FL). The proposed algorithm keeps the user preferences or interactions on local devices and uses a combination of on-device, local recommendation models and a complementary global model. The global model is trained only by the loss gradients of the local models, rather than directly using user preferences or interactions data. We demonstrate, using well-known datasets and benchmark algorithms, how this approach can deliver performance that is comparable with the current state-of-the-art while enhancing user privacy.
  • Publication
    Scalable Disambiguation System Capturing Individualities of Mentions
    Entity disambiguation, or mapping a phrase to its canonical representation in a knowledge base, is a fundamental step in many natural language processing applications. Existing techniques based on global ranking models fail to capture the individual peculiarities of the words and hence, struggle to meet the accuracy-time requirements of many real-world applications. In this paper, we propose a new system that learns specialized features and models for disambiguating each ambiguous phrase in the English language. We train and validate the hundreds of thousands of learning models for this purpose using a Wikipedia hyperlink dataset with more than 170 million labelled annotations. The computationally intensive training required for this approach can be distributed over a cluster. In addition, our approach supports fast queries, efficient updates and its accuracy compares favorably with respect to other state-of-the-art disambiguation systems.
      320Scopus© Citations 2