Now showing 1 - 10 of 74
  • Publication
    Adaptive Representations for Tracking Breaking News on Twitter
    Twitter is often the most up-to-date source for finding and tracking breaking news stories. Therefore, there is considerable interest in developing filters for tweet streams in order to track and summarize stories. This is a non-trivial text analytics task as tweets are short,and standard text similarity metrics often fail as stories evolve over time. In this paper we examine the effectiveness of adaptive text similarity mechanisms for tracking and summarizing breaking news stories. We evaluate the effectiveness of these mechanisms on a number of recent news events for which manually curated timelines are available. Assessments based on the ROUGE metric indicate that an adaptive similarity mechanism is best suited for tracking evolving stories on Twitter.
      100
  • Publication
    Dimensionality Reduction and Visualisation Tools for Voting Record
    (CEUR Workshop Proceedings, 2016-09-21) ; ; ;
    Recorded votes in legislative bodies are an important source of data for political scientists. Voting records can be used to describe parliamentary processes, identify ideological divides between members and reveal the strength of party cohesion. We explore the problem of working with vote data using popular dimensionality reduction techniques and cluster validation methods, as an alternative to more traditional scaling techniques. We present results of dimensionality reduction techniques applied to votes from the 6th and 7th European Parliaments, covering activity from 2004 to 2014.
      230
  • Publication
    An Analysis of Current Trends in CBR Research Using Multi-View Clustering
    (University College Dublin. School of Computer Science and Informatics, 2009-03) ; ; ;
    The European Conference on Case-Based Reasoning (CBR) in 2008 marked 15 years of international and European CBR conferences where almost seven hundred research papers were published. In this report we review the research themes covered in these papers and identify the topics that are active at the moment. The main mechanism for this analysis is a clustering of the research papers based on both co-citation links and text similarity. It is interesting to note that the core set of papers has attracted citations from almost three thousand papers outside the conference collection so it is clear that the CBR conferences are a sub-part of a much larger whole. It is remarkable that the research themes revealed by this analysis do not map directly to the sub-topics of CBR that might appear in a textbook. Instead they reflect the applications-oriented focus of CBR research, and cover the promising application areas and research challenges that are faced.
      60
  • Publication
    MeetupNet Dublin: Discovering Communities in Dublin's Meetup Network
    (CEUR Workshop Proceedings, 2018-12-07) ; ; ;
    Meetup.com is a global online platform which facilitates the organisation of meetups in different parts of the world. A meetup group typically focuses on one specific topic of interest, such as sports, music, language, or technology. However, many users of this platform attend multiple meetups. On this basis, we can construct a co-membership network for a given location. This network encodes how pairs of meetups are connected to one another via common members. In this work we demonstrate that, by applying techniques from social network analysis to this type of representation, we can reveal the underlying meetup community structure, which is not immediately apparent from the platform's website. Specifically, we map the landscape of Dublin's meetup communities, to explore the interests and activities of meetup.com users in the city.
      89
  • Publication
    Identifying representative textual sources in blog networks
    (University College Dublin. School of Computer Science and Informatics, 2011-02) ; ; ; ;
    We apply methods from social network analysis and visualization to facilitate a study of the Irish blogosphere from a cultural studies perspective. We focus on solving the practical issues that arise when the goal is to perform textual analysis of the corpus produced by a network of bloggers. Previous studies into blogging networks have noted difficulties arising when trying to identify the extent and boundaries of these networks. As a response to calls for increasingly data-led approaches in media and cultural studies, we discuss a variety of social network analysis methods that can be used to identify which blogs can be seen as members of a posited "Irish blogging network". We identify hub blogs, communities of sites corresponding to different topics, and representative bloggers within these communities. Based on this study, we propose a set of analysis guidelines for researchers who wish to map out blogging networks.
      2842
  • Publication
    Cross-Correlation Template Matching for Liver Localisation in Computed Tomography
    Many of the current approaches to automatic organ localisation in medical imaging require a large amount of labelled patient data to train systems to accurately identify specific anatomical features. CrossCorrelation, also known as template matching, is a statistical method of assessing the similarity between a template image and a target image. This method has been modified and presented here to localize the liver in Computed Tomography volume images in the Coronal and Sagital planes to achieve a mean positioning error of approximately 11 mm and 20 mm respectively based on between 1 and 25 datasets to create the template liver.
      86
  • Publication
    Handling Noisy Constraints in Semi-supervised Overlapping Community Finding
    Community structure is an essential property that helps us to understand the nature of complex networks. Since algorithms for detecting communities are unsupervised in nature, they can fail to uncover useful groupings, particularly when the underlying communities in a network are highly overlapping [1]. Recent work has sought to address this via semi-supervised learning [2], using a human annotator or “oracle” to provide limited supervision. This knowledge is typically encoded in the form of must-link and cannot-link constraints, which indicate that a pair of nodes should always be or should never be assigned to the same community. In this way, we can uncover communities which are otherwise difficult to identify via unsupervised techniques. However, in real semi-supervised learning applications, human supervision may be unreliable or “noisy”, relying on subjective decision making [3]. Annotators can disagree with one another, they might only have limited knowledge of a domain, or they might simply complete a labeling task incorrectly due to the burden of annotation. Thus, we might reasonably expect that the pairwise constraints used in a real semi-supervised community detection task could be imperfect or conflicting. The aim of this study is to explore the effect of noisy, incorrectly-labeled constraints on the performance of semisupervised community finding algorithms for overlapping networks. Furthermore, we propose an approach to mitigate such cases in real-world network analysis tasks. We treat noisy pairwise constraints as anomalies, and use an autoencoder, a commonlyused method in the domain of anomaly detection, to identify such constraints. Initial experiments on synthetic network demonstrate the usefulness of this approach.
      87
  • Publication
    Detecting Attention Dominating Moments Across Media Types
    (CEUR Workshop Proceedings, 2016-03-20) ; ;
    In this paper we address the problem of identifying attention dominating moments in online media. We are interested in discovering moments when everyone seems to be talking about the same thing. We investigate one particular aspect of breaking news: the tendency of multiple sources to concentrate attention on a single topic, leading to a collapse in diversity of content for a period of time. In this work we show that diversity at a topic level is effective for capturing this effect in blogs, in news articles, and on Twitter. The phenomenon is present in three distinctly different media types, each with their own unique features. We describe the phenomenon using case studies relating to major news stories from September 2015.
      132
  • Publication
    A Spectral Co-Clustering Approach for Dynamic Data
    (University College Dublin. School of Computer Science and Informatics, 2011-08) ;
    A common task in many domains with a temporal aspect involves identifying and tracking clusters over time. Often dynamic data will have a feature-based representation. In some cases, a direct mapping will exist for both objects and features over time. But in many scenarios, smaller subsets of objects or features alone will persist across successive time periods. To address this issue, we propose a dynamic spectral co-clustering algorithm for simultaneously clustering objects and features over time, as represented by a set of related bipartite graphs. We evaluate the algorithm on several synthetic datasets, a benchmark text corpus, and social bookmarking data.
      32
  • Publication
    Spectral co-clustering for dynamic bipartite graphs
    (Sun SITE Central Europe (CEUR), 2010-09-24) ;
    A common task in many domains with a temporal aspect involves identifying and tracking clusters over time. Often dynamic data will have a feature-based representation. In some cases, a direct mapping will exist for both objects and features over time. But in many scenarios, smaller subsets of objects or features alone will persist across successive time periods. To address this issue, we propose a dynamic spectral co-clustering method for simultaneously clustering objects and features over time, as represented by successive bipartite graphs. We evaluate the method on a benchmark text corpus and Web 2.0 tagging data.
      461