Now showing 1 - 10 of 75
  • Publication
    Temporal Analysis of Reddit Networks via Role Embeddings
    Inspired by diachronic word analysis from the field of natural language processing, we propose an approach for uncovering temporal insights regarding user roles from social networks using graph embedding methods. Specifically, we apply the role embedding algorithm, struc2vec, to a collection of social networks exhibiting either “loyal” or “vagrant” characteristics derived from the popular online social news aggregation website Reddit. For each subreddit, we extract nine months of data and create network role embeddings on consecutive time windows. We are then able to compare and contrast how user roles change over time by aligning the resulting temporal embeddings spaces. In particular, we analyse temporal role embeddings from an individual and a community-level perspective for both loyal and vagrant communities present on Reddit.
      341
  • Publication
    On Supporting Digital Journalism: Case Studies in Co-Designing Journalistic Tools
    Since 2013 researchers at University College Dublin in the Insight Centre for Data Analytics have been involved in a significant research programme in digital journalism, specifically targeting tools and social media guidelines to support the work of journalists. Most of this programme was undertaken in collaboration with The Irish Times. This collaboration involved identifying key problems currently faced by digital journalists, developing tools as solutions to these problems, and then iteratively co-designing these tools with feedback from journalists. This paper reports on our experiences and learnings from this research programme, with a view to informing similar efforts in the future.
      187
  • Publication
    MeetupNet Dublin: Discovering Communities in Dublin's Meetup Network
    (CEUR Workshop Proceedings, 2018-12-07) ; ; ;
    Meetup.com is a global online platform which facilitates the organisation of meetups in different parts of the world. A meetup group typically focuses on one specific topic of interest, such as sports, music, language, or technology. However, many users of this platform attend multiple meetups. On this basis, we can construct a co-membership network for a given location. This network encodes how pairs of meetups are connected to one another via common members. In this work we demonstrate that, by applying techniques from social network analysis to this type of representation, we can reveal the underlying meetup community structure, which is not immediately apparent from the platform's website. Specifically, we map the landscape of Dublin's meetup communities, to explore the interests and activities of meetup.com users in the city.
      146
  • Publication
    Dimensionality Reduction and Visualisation Tools for Voting Record
    (CEUR Workshop Proceedings, 2016-09-21) ; ; ;
    Recorded votes in legislative bodies are an important source of data for political scientists. Voting records can be used to describe parliamentary processes, identify ideological divides between members and reveal the strength of party cohesion. We explore the problem of working with vote data using popular dimensionality reduction techniques and cluster validation methods, as an alternative to more traditional scaling techniques. We present results of dimensionality reduction techniques applied to votes from the 6th and 7th European Parliaments, covering activity from 2004 to 2014.
      280
  • Publication
    Score Normalization and Aggregation for Active Learning in Multi-label Classification
    (University College Dublin. School of Computer Science and Informatics, 2010-02) ; ; ;
    Active learning is useful in situations where labeled data is scarce, unlabeled data is available, and labeling a large number of examples is costly or impractical. These techniques help by identifying a minimal set of examples to label that will support the training of an effective classifier. Thus active learning is particularly relevant for the automation of annotation tasks in multimedia. In this paper we consider the problem of employing active learning for the assignment of multiple annotations or “tags” to images in personal image collections. This form of multi-label classification has received a lot of attention in recent years, however active multi-label classification is still a new research area. The main challenge in active multilabel classification is the selection of unlabeled examples that will be informative for all tags under consideration. This selection task proves surprisingly difficult primarily because of the paucity of labeled data available. In this paper we present some solutions to this problem based on aggregated rankings from classifiers for individual tags.
      106
  • Publication
    Synthetic Dataset Generation for Online Topic Modeling
    Online topic modeling allows for the discovery of the underlying latent structure in a real time stream of data. In the evaluation of such approaches it is common that a static value for the number of topics is chosen. However, we would expect the number of topics to vary over time due to changes in the underlying structure of the data, known as concept drift and concept shift. We propose a semi-synthetic dataset generator, which can introduce concept drift and concept shift into existing annotated non-temporal datasets, via user-controlled paramaterization. This allows for the creation of multiple different artificial streams of data, where the “correct” number and composition of the topics is known at each point in time. We demonstrate how these generated datasets can be used as an evaluation strategy for online topic modeling approaches.
      270
  • Publication
    ThemeCrowds: Multiresolution Summaries of Twitter Usage
    (University College Dublin. School of Computer Science and Informatics, 2011-06) ; ; ; ;
    Users of social media sites, such as Twitter, rapidly generate large volumes of text content on a daily basis. Visual summaries are needed to understand what groups of people are saying collectively in this unstructured text data. Users will typically discuss a wide variety of topics, where the number of authors talking about a specific topic can quickly grow or diminish over time, and what the collective is saying about the subject can shift as a situation develops. In this paper, we present a technique that summarises what collections of Twitter users are saying about certain topics over time. As the correct resolution for inspecting the data is unknown in advance, the users are clustered hierarchically over a fixed time interval based on the similarity of their posts. The visualisation technique takes this data structure as its input. Given a topic, it finds the correct resolution of users at each time interval and provides tags to summarise what the collective is discussing. The technique is tested on three microblogging corpora, consisting of up to tens of millions of tweets and over a million users. We provide some preliminary user feedback from a research group interested in the area of social media analysis, where this tool could be applied.
      77
  • Publication
    Taking the pulse of the web : assessing sentiment on topics in online media
    The task of identifying sentiment trends in the popular media has long been of interest to analysts and pundits. Until recently, this task has required professional annotators to manually inspect individual articles in order to identify their polarity. With the increased availability of large volumes of online news content via syndicated feeds, researchers have begun to examine ways to automate aspects of this process. In this work, we describe a sentiment analysis system that uses crowdsourcing to gather non-expert annotations for economic news articles. By using these annotations in conjunction with a supervised machine learning strategy, we can generalize to label a much larger set of articles, allowing us to effectively track sentiment in different news sources over time.
      265
  • Publication
    Distortion as a validation criterion in the identification of suspicious reviews
    (University College Dublin. School of Computer Science and Informatics, 2010-05-02) ; ; ;
    Assessing the trustworthiness of reviews is a key issue for the maintainers of opinion sites such as TripAdvisor. In this paper we propose a distortion criterion for assessing the impact of methods for uncovering suspicious hotel reviews in TripAdvisor. The principle is that dishonest reviews will distort the overall popularity ranking for a collection of hotels. Thus a mechanism that deletes dishonest reviews will distort the popularity ranking significantly, when compared with the removal of a similar set of reviews at random. This distortion can be quantified by comparing popularity rankings before and after deletion, using rank correlation. We present an evaluation of this strategy in the assessment of shill detection mechanisms on a dataset of hotel reviews collected from TripAdvisor.
      1220
  • Publication
    An Investigation into Information Navigation via Diverse Keyword-based Facets
    In the age of information overload, it is necessary to provide effective information navigation tools that operate over unstructured textual data. Current state-of-the-art methods are limited in terms of providing effective exploration capabilities for various information seeking tasks, especially those arising in domains such as online journalism. Here we argue for improvements in faceted search systems, via new strategies for identifying keyword-based facets. Our proposed technique utilises a PageRank model operating over the graph of terms appearing in documents, while also employing novel methods for biasing significant terms and named entities. In addition, we consider the notion of diversity within extracted keywords in an effort to maximize coverage over a range of topics. We perform experimental evaluations over issues relevant to the Irish General Elections 2016, demonstrating the superior performance of our proposed technique.
      179