Now showing 1 - 10 of 69
  • Publication
    An Analysis of Current Trends in CBR Research Using Multi-View Clustering
    (University College Dublin. School of Computer Science and Informatics, 2009-03) ; ; ;
    The European Conference on Case-Based Reasoning (CBR) in 2008 marked 15 years of international and European CBR conferences where almost seven hundred research papers were published. In this report we review the research themes covered in these papers and identify the topics that are active at the moment. The main mechanism for this analysis is a clustering of the research papers based on both co-citation links and text similarity. It is interesting to note that the core set of papers has attracted citations from almost three thousand papers outside the conference collection so it is clear that the CBR conferences are a sub-part of a much larger whole. It is remarkable that the research themes revealed by this analysis do not map directly to the sub-topics of CBR that might appear in a textbook. Instead they reflect the applications-oriented focus of CBR research, and cover the promising application areas and research challenges that are faced.
  • Publication
    Tracking the Evolution of Communities in Dynamic Social Networks
    (University College Dublin. School of Computer Science and Informatics, 2011-05) ; ;
    Real-world social networks from many domains can naturally be modelled as dynamic graphs. However, approaches for detecting communities have largely focused on identifying communities in static graphs. Therefore, researchers have begun to consider the problem of tracking the evolution of groups of users in dynamic scenarios. Here we describe a model for tracking communities which persist over time in dynamic networks, where each community is characterised by a series of evolutionary events. Based on this model, we propose a scalable community-tracking strategy for efficiently identifying dynamic communities. Evaluations on a large number of synthetic graphs containing embedded evolutionary events demonstrate that this strategy can successfully track communities over time in dynamic networks with different levels of volatility. We then describe experiments to explore the evolving community structures present in real mobile operator networks, represented by monthly call graphs for millions of subscribers.
  • Publication
    Taking the pulse of the web : assessing sentiment on topics in online media
    The task of identifying sentiment trends in the popular media has long been of interest to analysts and pundits. Until recently, this task has required professional annotators to manually inspect individual articles in order to identify their polarity. With the increased availability of large volumes of online news content via syndicated feeds, researchers have begun to examine ways to automate aspects of this process. In this work, we describe a sentiment analysis system that uses crowdsourcing to gather non-expert annotations for economic news articles. By using these annotations in conjunction with a supervised machine learning strategy, we can generalize to label a much larger set of articles, allowing us to effectively track sentiment in different news sources over time.
  • Publication
    Exploring the Relationship between Membership Turnover and Productivity in Online Communities
    (Association for the Advancement of Artificial Intelligence, 2014-06-04) ; ;
    One of the more disruptive reforms associated with the modern Internet is the emergence of online communities working together on knowledge artefacts such as Wikipedia and OpenStreetMap. Recently it has become clear that these initiatives are vulnerable because of problems with membership turnover. This study presents a longitudinal analysis of 891 Wiki Projects where we model the impact of member turnover and social capital losses on project productivity. By examining social capital losses we attempt to provide a more nuanced analysis of member turnover. In this context social capital is modelled from a social network perspective where the loss of more central members has more impact. We find that only a small proportion of Wiki Projects are in a relatively healthy state with low levels of membership turnover and social capital losses.The results show that the relationship between social capital losses and project performance is U-shaped, and that member withdrawal has significant negative effect on project outcomes. The results also support the mediation of turnover rate and network density on the curvilinear relationship.
  • Publication
    Viewing the minimum dominating set and maximum coverage problems motivated by "word of mouth marketing" in a problem decomposition context
    (University College Dublin. School of Computer Science and Informatics, 2009) ; ;
    Modelling and analyzing the flow of influence is a key challenge in social network analysis. In scenarios where the network is too large to analyze in detail for computational reasons graph partitioning is a useful aid to decompose the large graph into manageable subgraphs. The question that arises in such a situation is how to partition a given graph such that the the solution obtained by combining the solutions from the individual subgraphs is as close as possible to the optimal solution obtained from the full graph (with respect to a particular objective). While graph cuts such as the min cut, ratio cut and normalised cut are a useful aid in breaking down the large problem into tractable subproblems, they may not yield the optimal graph partitioning with respect to a given objective. A natural question that arises in this scenario is “How close is the solution given by the graph cut to that of the optimal partitioning?” or in other words Are the above graph cuts good heuristics? In this report we pose the above questions with respect to two graph theoretic problems namely the minimum dominating set and maximum coverage. We partition the graphs using the normalised cut and present results that suggest that the normalised cut provides a “good partitioning” with respect to the defined objective.
  • Publication
    Linguistically Informed Tweet Categorization for Online Reputation Management
    (Association for Computational Linguistics, 2014-06-27) ;
    Determining relevant content automatically is a challenging task for any aggregation system. In the business intelligence domain, particularly in the application area of Online Reputation Management, it may be desirable to label tweets as either customer comments which deserve rapid attention or tweets from industry experts or sources regarding the higher-level operations of a particular entity. We present an approach using a combination of linguistic and Twitter-specific features to represent tweets and examine the efficacy of these in distinguishing between tweets which have been labelled using Amazon’s Mechanical Turk crowd sourcing platform. Features such as part of-speech tags and function words provehighly effective at discriminating between the two categories of tweet related to several distinct entity types, with Twitter related metrics such as the presence of hash tags, retweets and user mentions also adding to classification accuracy. Accuracy of 86% is reported using an SVM classifier and a mixed set of the aforementioned features on a corpus of tweets related to seven business entities.
  • Publication
    Time Series Clustering of Moodle Activity Data
    Modern computer systems generate large volumes of log data as a matter of course and the analysis of this log data is seen as one of the most promising opportunities in big data analytics. Moodle is a Virtual Learning Environment (VLEs) used extensively in third level education that captures a significant amount of log data on student activity. In this paper we present an analysis of Moodle data that reveals interesting differences in student work patterns. We demonstrate that, by clustering activity profiles represented as time series using Dynamic Time Warping, we can uncover meaningful clusters of students exhibiting similar behaviours. We use these clusters to identify distinct activity patterns among students, such as Procrastinators, Strugglers, and Experts. We see educators as the potential users of a tool that might result from this research and our preliminary analysis does identify scenarios where interventions should be made to help struggling students.
  • Publication
    Distortion as a validation criterion in the identification of suspicious reviews
    (University College Dublin. School of Computer Science and Informatics, 2010-05-02) ; ; ;
    Assessing the trustworthiness of reviews is a key issue for the maintainers of opinion sites such as TripAdvisor. In this paper we propose a distortion criterion for assessing the impact of methods for uncovering suspicious hotel reviews in TripAdvisor. The principle is that dishonest reviews will distort the overall popularity ranking for a collection of hotels. Thus a mechanism that deletes dishonest reviews will distort the popularity ranking significantly, when compared with the removal of a similar set of reviews at random. This distortion can be quantified by comparing popularity rankings before and after deletion, using rank correlation. We present an evaluation of this strategy in the assessment of shill detection mechanisms on a dataset of hotel reviews collected from TripAdvisor.
  • Publication
    Community Finding in Large Social Networks Through Problem Decomposition
    (University College Dublin. School of Computer Science and Informatics, 2008-08) ; ; ;
    The identification of cohesive communities is a key process in social network analysis. However, the algorithms that are effective for finding communities do not scale well to very large problems, as their time complexity is worse than linear in the number of edges in the graph. This is an important issue for those interested in applying social network analysis techniques to very large networks, such as networks of mobile phone subscribers. In this respect the contributions of this report are two-fold. First we demonstrate these scaling issues using a prominent community-finding algorithm as a case study. We then show that a twostage process, whereby the network is first decomposed into manageable subnetworks using a multilevel graph partitioning procedure, is effective in finding communities in networks with more than 106 nodes.
  • Publication
    A Case-Study on the Impact of Dynamic Time Warping in Time Series Regression
    It is well understood that Dynamic Time Warping (DTW) is effective in revealing similarities between time series that do not align perfectly. In this paper, we illustrate this on spectroscopy time-series data. We show that DTW is effective in improving accuracy on a regression task when only a single wavelength is considered. When combined with k-Nearest Neighbour, DTW has the added advantage that it can reveal similarities and differences between samples at the level of the time-series. However, in the problem, we consider here data is available across a spectrum of wavelengths. If aggregate statistics (means, variances) are used across many wavelengths the benefits of DTW are no longer apparent. We present this as another example of a situation where big data trumps sophisticated models in Machine Learning.