Now showing 1 - 10 of 31
  • Publication
    ThemeCrowds: Multiresolution Summaries of Twitter Usage
    (University College Dublin. School of Computer Science and Informatics, 2011-06) ; ; ; ;
    Users of social media sites, such as Twitter, rapidly generate large volumes of text content on a daily basis. Visual summaries are needed to understand what groups of people are saying collectively in this unstructured text data. Users will typically discuss a wide variety of topics, where the number of authors talking about a specific topic can quickly grow or diminish over time, and what the collective is saying about the subject can shift as a situation develops. In this paper, we present a technique that summarises what collections of Twitter users are saying about certain topics over time. As the correct resolution for inspecting the data is unknown in advance, the users are clustered hierarchically over a fixed time interval based on the similarity of their posts. The visualisation technique takes this data structure as its input. Given a topic, it finds the correct resolution of users at each time interval and provides tags to summarise what the collective is discussing. The technique is tested on three microblogging corpora, consisting of up to tens of millions of tweets and over a million users. We provide some preliminary user feedback from a research group interested in the area of social media analysis, where this tool could be applied.
      76
  • Publication
    Are You Reaching Your Audience? Exploring Item Exposure over Consumer Segments in Recommender Systems
    Many state-of-the-art recommender systems are known to suffer from popularity bias, which means that they have a tendency to recommend items that are already popular, making those items even more popular. This results in the item catalogue being not fully utilised, which is far from ideal from the business’ perspective. Issues of item exposure are actually more complex than simply overexposure of popular items. In this paper we look at the exposure of individual items to different groups of consumers, the item’s audience, and address the question of whether recommender systems reach each item’s potential audience. Thus, we go beyond state-of-the-art analyses that have simply addressed the extent to which items are recommended, regardless of whether they are reaching their target audience. We conduct an empirical study on the MovieLens 20M dataset showing that recommender systems do not fully utilise items’ audiences, and existing sales diversity optimisers do not improve their exposure.
      159
  • Publication
    Evaluating Hierarchies through A Partially Observable Markov Decision Processes Methodology
    Hierarchical clustering has been shown to be valuable in many scenarios, e.g. catalogues, biology research, image processing, and so on. Despite its usefulness to many situations, there is no agreed methodology on how to properly evaluate the hierarchies produced from different techniques, particularly in the case where ground-truth labels are unavailable. This motivates us to propose a framework for assessing the quality of hierarchical clustering allocations which covers the case of no ground-truth information. Such a quality measurement is useful, for example, to assess the hierarchical structures used by online retailer websites to display their product catalogues. Differently to all the previous measures and metrics, our framework tackles the evaluation from a decision theoretic perspective. We model the process as a bot searching stochastically for items in the hierarchy and establish a measure representing the degree to which the hierarchy supports this search. We employ the concept of Partially Observable Markov Decision Processes (POMDP) to model the uncertainty, the decision making, and the cognitive return for searchers in such a scenario. In this paper, we fully discuss the modeling details and demonstrate its application on some datasets.
      157
  • Publication
    Community Finding in Large Social Networks Through Problem Decomposition
    (University College Dublin. School of Computer Science and Informatics, 2008-08) ; ; ;
    The identification of cohesive communities is a key process in social network analysis. However, the algorithms that are effective for finding communities do not scale well to very large problems, as their time complexity is worse than linear in the number of edges in the graph. This is an important issue for those interested in applying social network analysis techniques to very large networks, such as networks of mobile phone subscribers. In this respect the contributions of this report are two-fold. First we demonstrate these scaling issues using a prominent community-finding algorithm as a case study. We then show that a twostage process, whereby the network is first decomposed into manageable subnetworks using a multilevel graph partitioning procedure, is effective in finding communities in networks with more than 106 nodes.
      113
  • Publication
    Be In The Know: Connecting News Articles to Relevant Twitter Conversations
    In this paper we propose a framework for tracking and automatically connecting news articles to Twitter conversations as captured by Twitter hashtags. For example, such a system could alert journalists about news that get a lot of Twitter reaction, so they can investigate those conversations for new developments in the story, promote their article to a set of interested consumers, or discover general sentiment towards the story. Mapping articles to hashtags is nevertheless challenging, due to different language style of articles versus tweets, the streaming aspect, and user behavior when marking tweet-terms as hashtags. We track the Irish Times RSS-feed and a focused Twitter stream over a two months period, and present a system that assigns hashtags to each article, based on its Twitter echo. We propose a machine learning approach for classifying article hashtag pairs. Our empirical study shows that our system delivers high precision for this task.
      201
  • Publication
    Exploring Tweet Engagement in the RecSys 2014 Data Challenge
    While much recommender system research has been driven by the rating prediction task, there is an emphasis in recent research on exploring new methods to evaluate the effectiveness of a recommendation. The Recommender Systems Challenge 2014 takes up this theme by challenging re-searchers to explore engagement as an evaluation criterion.In this paper we discuss how predicting engagement differs from the traditional rating prediction task and motivate the rationale behind our approach to the challenge. We show that standard matrix factorization recommender algorithms do not perform well on the task. Our solution depends on clustering items according to their time-dependent profile to distinguish topical movies from other movies. Our pre-diction engine also exploits the observation that extreme ratings are more likely to attract engagement.
      409
  • Publication
    Detecting highly overlapping community structure by greedy clique expansion
    In complex networks it is common for each node to belong to several communities, implying a highly overlapping community structure. Recent advances in benchmarking indicate that existing community assignment algorithms that are capable of detecting overlapping communities perform well only when the extent of community overlap is kept to modest levels. To overcome this limitation, we introduce a new community assignment algorithm called Greedy Clique Expansion (GCE). The algorithm identifies distinct cliques as seeds and expands these seeds by greedily optimizing a local fitness function. We perform extensive benchmarks on synthetic data to demonstrate that GCE's good performance is robust across diverse graph topologies. Significantly, GCE is the only algorithm to perform well on these synthetic graphs, in which every node belongs to multiple communities. Furthermore, when put to the task of identifying functional modules in protein interaction data, and college dorm assignments in Facebook friendship data, we find that GCE performs competitively.
      1758
  • Publication
    Personalised Ranking with Diversity
    (ACM, 2013-10-16)
    In this paper we discuss a method to incorporate diversity into a personalised ranking objective, in the context of ranking-based recommendation using implicit feedback. The goal is to provide a ranking of items that respects user preferences while also tending to rank diverse items closely together. A prediction formula is learned as the product of user and item feature vectors, in order to minimise the mean squared error objective used previously in the RankALS and RankSGD methods, but modified to weight the difference in ratings between two items by the dissimilarity of those items. We report on preliminary experiments with this modified objective, in which the minimisation is carried out using stochastic gradient descent. We show that rankings based on the output of the minimisation succeed in producing recommendation lists with greater diversity, with just a small loss in relevance of the recommendation, as measured by the error rate.
      370
  • Publication
    Personalised Diversification Using Intent-Aware Portfolio
    The intent-aware diversification framework considers a set of aspects associated with items to be recommended. A baseline recommendation is greedily re-ranked using an objective that promotes diversity across the aspects. In this paper the framework is analysed and a new intent-aware objective is derived that considers the minimum variance criterion, connecting the framework directly to portfolio diversification from finance. We derive an aspect model that supports the goal of minimum variance and that is faithful to the underlying baseline algorithm. We evaluate diversification capabilities of the proposed method on the MovieLens dataset.
      435Scopus© Citations 6
  • Publication
    Detecting highly overlapping communities with Model-based Overlapping Seed Expansion
    (IEEE Computer Society, 2010-08) ;
    As research into community finding in social networks progresses, there is a need for algorithms capable of detecting overlapping community structure. Many algorithms have been proposed in recent years that are capable of assigning each node to more than a single community. The performance of these algorithms tends to degrade when the ground-truth contains a more highly overlapping community structure, with nodes assigned to more than two communities. Such highly overlapping structure is likely to exist in many social networks, such as Facebook friendship networks. In this paper we present a scalable algorithm, MOSES, based on a statistical model of community structure, which is capable of detecting highly overlapping community structure, especially when there is variance in the number of communities each node is in. In evaluation on synthetic data MOSES is found to be superior to existing algorithms, especially at high levels of overlap. We demonstrate MOSES on real social network data by analyzing the networks of friendship links between students of five US universities.
      1794