Now showing 1 - 10 of 38
  • Publication
    Analyzing Discourse Communities with Distributional Semantic Models
    This paper presents a new corpus-driven approach applicable to the study of language patterns in social and political contexts, or Critical Discourse Analysis (CDA) using Distributional Semantic Models (DSMs). This approach considers changes in word semantics, both over time and between communities with differing viewpoints. The geometrical spaces constructed by DSMs or 'word spaces' offer an objective, robust exploratory analysis tool for revealing novel patterns and similarities between communities, as well as highlighting when these changes occur. To quantify differences between word spaces built on different time periods and from different communities, we analyze the nearest neighboring words in the DSM, a process we relate to analyzing 'concordance lines'. This makes the approach intuitive and interpretable to practitioners. We demonstrate the usefulness of the approach with two case studies, following groups with opposing political ideologies in the Scottish Independence Referendum, and the US Midterm Elections 2014.
    Scopus© Citations 17  1441
  • Publication
    Detecting Attention Dominating Moments Across Media Types
    (CEUR Workshop Proceedings, 2016-03-20) ; ;
    In this paper we address the problem of identifying attention dominating moments in online media. We are interested in discovering moments when everyone seems to be talking about the same thing. We investigate one particular aspect of breaking news: the tendency of multiple sources to concentrate attention on a single topic, leading to a collapse in diversity of content for a period of time. In this work we show that diversity at a topic level is effective for capturing this effect in blogs, in news articles, and on Twitter. The phenomenon is present in three distinctly different media types, each with their own unique features. We describe the phenomenon using case studies relating to major news stories from September 2015.
      180
  • Publication
    How Many Topics? Stability Analysis for Topic Models
    Topic modeling refers to the task of discovering the underlyingthematic structure in a text corpus, where the output is commonlypresented as a report of the top terms appearing in each topic. Despitethe diversity of topic modeling algorithms that have been proposed, acommon challenge in successfully applying these techniques is the selectionof an appropriate number of topics for a given corpus. Choosingtoo few topics will produce results that are overly broad, while choosingtoo many will result in theover-clustering of a corpus into many small,highly-similar topics. In this paper, we propose a term-centric stabilityanalysis strategy to address this issue, the idea being that a model withan appropriate number of topics will be more robust to perturbations inthe data. Using a topic modeling approach based on matrix factorization,evaluations performed on a range of corpora show that this strategy cansuccessfully guide the model selection process.
    Scopus© Citations 129  563
  • Publication
    Using crowdsourcing and active learning to track sentiment in online media
    Tracking sentiment in the popular media has long been of interest to media analysts and pundits. With the availability of news content via online syndicated feeds, it is now possible to automate some aspects of this process. There is also great potential to crowdsource much of the annotation work that is required to train a machine learning system to perform sentiment scoring. We describe such a system for tracking economic sentiment in online media that has been deployed since August 2009. It uses annotations provided by a cohort of non-expert annotators to train a learning system to classify a large body of news items. We report on the design challenges addressed in managing the effort of the annotators and in making annotation an interesting experience.
    Scopus© Citations 67  2915
  • Publication
    Hierarchical Modularity and the Evolution of Genetic Interactomes across Species
    To date, cross-species comparisons of genetic interactomes have been restricted to small or functionally related gene sets, limiting our ability to infer evolutionary trends. To facilitate a more comprehensive analysis, we constructed a genome-scale epistasis map (E-MAP) for the fission yeast Schizosaccharomyces pombe, providing phenotypic signatures for ~60% of the nonessential genome. Using these signatures, we generated a catalog of 297 functional modules, and we assigned function to 144 previously uncharacterized genes, including mRNA splicing and DNA damage checkpoint factors. Comparison with an integrated genetic interactome from the budding yeast Saccharomyces cerevisiae revealed a hierarchical model for the evolution of genetic interactions, with conservation highest within protein complexes, lower within biological processes, and lowest between distinct biological processes. Despite the large evolutionary distance and extensive rewiring of individual interactions, both networks retain conserved features and display similar levels of functional crosstalk between biological processes, suggesting general design principles of genetic interactomes.
    Scopus© Citations 152  630
  • Publication
    A Latent Space Analysis of Editor Lifecycles in Wikipedia
    Collaborations such as Wikipedia are a key part of the valueof the modern Internet. At the same time there is concern that thesecollaborations are threatened by high levels of member turnover. In thispaper we borrow ideas from topic analysis to editor activity on Wikipediaover time into a latent space that oers an insight into the evolvingpatterns of editor behavior. This latent space representation reveals anumber of dierent categories of editor (e.g. content experts, social net-workers) and we show that it does provide a signal that predicts aneditor's departure from the community. We also show that long termeditors gradually diversify their participation by shifting edit preferencefrom one or two namespaces to multiple namespaces and experience rel-atively soft evolution in their editor proles, while short term editorsgenerally distribute their contribution randomly among the namespacesand experience considerably uctuated evolution in their editor proles.
      355
  • Publication
    Deriving insights from national happiness indices
    In online social media, individuals produce vast amounts of content which in effect "instruments" the world around us. Users on sites such as Twitter are publicly broadcasting status updates that provide an indication of their mood at a given moment in time, often accompanied by geolocation information. A number of strategies exist to aggregate such content to produce sentiment scores in order to build a "happiness index". In this paper, we describe such a system based on Twitter that maintains a happiness index for nine US cities. The main contribution of this paper is a companion system called SentireCrowds that allows us to identify the underlying causes behind shifts in sentiment. This ability to analyse the components of the sentiment signal highlights a number of problems. It shows that sentiment scoring on social media data without considering context is difficult. More importantly, it highlights cases where sentiment scoring methods are susceptible to unexpected shifts due to noise and trending memes.
      1366Scopus© Citations 14
  • Publication
    Tracking the Evolution of Communities in Dynamic Social Networks
    (University College Dublin. School of Computer Science and Informatics, 2011-05) ; ;
    Real-world social networks from many domains can naturally be modelled as dynamic graphs. However, approaches for detecting communities have largely focused on identifying communities in static graphs. Therefore, researchers have begun to consider the problem of tracking the evolution of groups of users in dynamic scenarios. Here we describe a model for tracking communities which persist over time in dynamic networks, where each community is characterised by a series of evolutionary events. Based on this model, we propose a scalable community-tracking strategy for efficiently identifying dynamic communities. Evaluations on a large number of synthetic graphs containing embedded evolutionary events demonstrate that this strategy can successfully track communities over time in dynamic networks with different levels of volatility. We then describe experiments to explore the evolving community structures present in real mobile operator networks, represented by monthly call graphs for millions of subscribers.
      361
  • Publication
    Multi-View Clustering for Mining Heterogeneous Social Network Data
    (University College Dublin. School of Computer Science and Informatics, 2009-03) ;
    Uncovering community structure is a core challenge in social network analysis. This is a significant challenge for large networks where there is a single type of relation in the network (e.g. friend or knows). In practice there may be other types of relation, for instance demographic or geographic information, that also reveal network structure. Uncovering structure in such multi-relational networks presents a greater challenge due to the difficulty of integrating information from different, often discordant views. In this paper we describe a system for performing cluster analysis on heterogeneous multi-view data, and present an analysis of the research themes in a bibliographic literature network, based on the integration of both co-citation links and text similarity relationships between papers in the network.
      63
  • Publication
    An Analysis of Current Trends in CBR Research Using Multi-View Clustering
    (University College Dublin. School of Computer Science and Informatics, 2009-03) ; ; ;
    The European Conference on Case-Based Reasoning (CBR) in 2008 marked 15 years of international and European CBR conferences where almost seven hundred research papers were published. In this report we review the research themes covered in these papers and identify the topics that are active at the moment. The main mechanism for this analysis is a clustering of the research papers based on both co-citation links and text similarity. It is interesting to note that the core set of papers has attracted citations from almost three thousand papers outside the conference collection so it is clear that the CBR conferences are a sub-part of a much larger whole. It is remarkable that the research themes revealed by this analysis do not map directly to the sub-topics of CBR that might appear in a textbook. Instead they reflect the applications-oriented focus of CBR research, and cover the promising application areas and research challenges that are faced.
      114