Now showing 1 - 8 of 8
  • Publication
    Real time event monitoring with trident
    Building a scalable, fault-tolerant stream mining system that deals with realistic data volumes presents unique challenges. Considerable work is being done to make the development of such systems simpler, creating high level abstractions on top of existing systems. Many of the technical barriers can be eliminated by adopting a state-of-the-art interface, such as the Trident API for Storm. We describe a stream mining tool, based on Trident, for monitoring breaking news events on Twitter, which can be extended quickly and scaled easily.
  • Publication
    Detecting Attention Dominating Moments Across Media Types
    (CEUR Workshop Proceedings, 2016-03-20) ; ;
    In this paper we address the problem of identifying attention dominating moments in online media. We are interested in discovering moments when everyone seems to be talking about the same thing. We investigate one particular aspect of breaking news: the tendency of multiple sources to concentrate attention on a single topic, leading to a collapse in diversity of content for a period of time. In this work we show that diversity at a topic level is effective for capturing this effect in blogs, in news articles, and on Twitter. The phenomenon is present in three distinctly different media types, each with their own unique features. We describe the phenomenon using case studies relating to major news stories from September 2015.
  • Publication
    Analyzing Discourse Communities with Distributional Semantic Models
    This paper presents a new corpus-driven approach applicable to the study of language patterns in social and political contexts, or Critical Discourse Analysis (CDA) using Distributional Semantic Models (DSMs). This approach considers changes in word semantics, both over time and between communities with differing viewpoints. The geometrical spaces constructed by DSMs or 'word spaces' offer an objective, robust exploratory analysis tool for revealing novel patterns and similarities between communities, as well as highlighting when these changes occur. To quantify differences between word spaces built on different time periods and from different communities, we analyze the nearest neighboring words in the DSM, a process we relate to analyzing 'concordance lines'. This makes the approach intuitive and interpretable to practitioners. We demonstrate the usefulness of the approach with two case studies, following groups with opposing political ideologies in the Scottish Independence Referendum, and the US Midterm Elections 2014.
      1448Scopus© Citations 17
  • Publication
    A system for twitter user list curation
    With increased adoption of social networking tools, it is becoming more difficult to extract useful information from the mass of data generated daily by users. Curation of content and sources is an important filter in separating the signal from noise. A good set of credible sources often requires painstaking manual curation, which often yields incomplete coverage of a topic. In this demo, we present a recommender system to aid this process, improving the quality and quantity of sources. The system is highly-adaptable to the goals of the curator, enabling some novel uses for curating and monitoring lists of users.
      177Scopus© Citations 2
  • Publication
    Dimensionality Reduction and Visualisation Tools for Voting Record
    (CEUR Workshop Proceedings, 2016-09-21) ; ; ;
    Recorded votes in legislative bodies are an important source of data for political scientists. Voting records can be used to describe parliamentary processes, identify ideological divides between members and reveal the strength of party cohesion. We explore the problem of working with vote data using popular dimensionality reduction techniques and cluster validation methods, as an alternative to more traditional scaling techniques. We present results of dimensionality reduction techniques applied to votes from the 6th and 7th European Parliaments, covering activity from 2004 to 2014.
  • Publication
    Adaptive Representations for Tracking Breaking News on Twitter
    Twitter is often the most up-to-date source for finding and tracking breaking news stories. Therefore, there is considerable interest in developing filters for tweet streams in order to track and summarize stories. This is a non-trivial text analytics task as tweets are short,and standard text similarity metrics often fail as stories evolve over time. In this paper we examine the effectiveness of adaptive text similarity mechanisms for tracking and summarizing breaking news stories. We evaluate the effectiveness of these mechanisms on a number of recent news events for which manually curated timelines are available. Assessments based on the ROUGE metric indicate that an adaptive similarity mechanism is best suited for tracking evolving stories on Twitter.
  • Publication
    Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering
    Twitter has become as much of a news media as a social network, and much research has turned to analysing its content for tracking real-world events, from politics to sports and natural disasters. This paper describes the techniques we employed for the SNOW Data Challenge 2014, described in [16]. We show that aggressive lettering of tweets based on length and structure, combined with hierarchical clustering of tweets and ranking of the resulting clusters, achieves encouraging results. We present empirical results and discussion for two different Twitter streams focusing on the US presidential elections in 2012 and the recent events about Ukraine, Syria and the Bitcoin, in February 2014.
  • Publication
    From Detection to Discourse: Tracking Events and Communities in Breaking News
    (University College Dublin. School of Computer Science, 2016-12)
    Online social networks are now an established part of our reality. People no longer rely solely on traditional media outlets to stay informed. Collectively, acts of citizen journalism have transformed news consumers into producers. Keeping up with the overwhelming volume of user-generated content from social media sources is challenging for even well-resourced news organisations. Filtering the most relevant content, however, is not trivial. Significant demand exists for editorial support systems that enable journalists to work more effectively. Social newsgathering introduces many new challenges to the tasks of detecting and tracking breaking news stories. In detection, substantial volumes of data introduce scalability challenges. When tracking developing stories, approaches developed on static collections of documents often fail to capture important changes in the content or structure of data over time. Furthermore, systems tuned on static collections can perform poorly on new, unseen data. To understand significant events, we must also consider the people and organisations who are generating content related to these events. Newsworthy sources are rarely objective and neutral, and in some cases, purposefully created for disinformation, giving rise to the "fake news" phenomenon. An individual's political ideology will inform and influence their choice of language, especially during significant political events such as elections, protests, and other polarising incidents. This thesis presents techniques developed with the intention of supporting journalists who monitor social media for breaking news. Starting with the curation of newsworthy sources, through to implementing an alert system for breaking news events, tracking the evolution of these stories over time, and finally exploring the language used by different communities to gain insights into the discourse around an event. As well as detecting and tracking significant events, it is of interest to identify the differences in language patterns between groups of people around those events. Distributional semantic language models offer a way to quantify certain aspects of discourse, allowing us to track how different communities use language, thereby revealing their stances on key issues.