Now showing 1 - 10 of 34
  • Publication
    Are You Reaching Your Audience? Exploring Item Exposure over Consumer Segments in Recommender Systems
    Many state-of-the-art recommender systems are known to suffer from popularity bias, which means that they have a tendency to recommend items that are already popular, making those items even more popular. This results in the item catalogue being not fully utilised, which is far from ideal from the business’ perspective. Issues of item exposure are actually more complex than simply overexposure of popular items. In this paper we look at the exposure of individual items to different groups of consumers, the item’s audience, and address the question of whether recommender systems reach each item’s potential audience. Thus, we go beyond state-of-the-art analyses that have simply addressed the extent to which items are recommended, regardless of whether they are reaching their target audience. We conduct an empirical study on the MovieLens 20M dataset showing that recommender systems do not fully utilise items’ audiences, and existing sales diversity optimisers do not improve their exposure.
  • Publication
    Exploring Tweet Engagement in the RecSys 2014 Data Challenge
    While much recommender system research has been driven by the rating prediction task, there is an emphasis in recent research on exploring new methods to evaluate the effectiveness of a recommendation. The Recommender Systems Challenge 2014 takes up this theme by challenging re-searchers to explore engagement as an evaluation criterion.In this paper we discuss how predicting engagement differs from the traditional rating prediction task and motivate the rationale behind our approach to the challenge. We show that standard matrix factorization recommender algorithms do not perform well on the task. Our solution depends on clustering items according to their time-dependent profile to distinguish topical movies from other movies. Our pre-diction engine also exploits the observation that extreme ratings are more likely to attract engagement.
  • Publication
    Detecting highly overlapping community structure by greedy clique expansion
    In complex networks it is common for each node to belong to several communities, implying a highly overlapping community structure. Recent advances in benchmarking indicate that existing community assignment algorithms that are capable of detecting overlapping communities perform well only when the extent of community overlap is kept to modest levels. To overcome this limitation, we introduce a new community assignment algorithm called Greedy Clique Expansion (GCE). The algorithm identifies distinct cliques as seeds and expands these seeds by greedily optimizing a local fitness function. We perform extensive benchmarks on synthetic data to demonstrate that GCE's good performance is robust across diverse graph topologies. Significantly, GCE is the only algorithm to perform well on these synthetic graphs, in which every node belongs to multiple communities. Furthermore, when put to the task of identifying functional modules in protein interaction data, and college dorm assignments in Facebook friendship data, we find that GCE performs competitively.
  • Publication
    Partially Observable Markov Decision Process Modelling for Assessing Hierarchies
    Hierarchical clustering has been shown to be valuable in many scenarios. Despite its usefulness to many situations, there is no agreed methodology on how to properly evaluate the hierarchies produced from di erent techniques, particularly in the case where groundtruth labels are unavailable. This motivates us to propose a framework for assessing the quality of hierarchical clustering allocations which covers the case of no ground-truth information. This measurement is useful, e.g., to assess the hierarchical structures used by online retailer websites to display their product catalogues. Our framework is one of the few attempts for the hierarchy evaluation from a decision theoretic perspective. We model the process as a bot searching stochastically for items in the hierarchy and establish a measure representing the degree to which the hierarchy supports this search. We employ Partially Observable Markov Decision Processes (POMDP) to model the uncertainty, the decision making, and the cognitive return for searchers in such a scenario.
  • Publication
    Community Finding in Large Social Networks Through Problem Decomposition
    (University College Dublin. School of Computer Science and Informatics, 2008-08) ; ; ;
    The identification of cohesive communities is a key process in social network analysis. However, the algorithms that are effective for finding communities do not scale well to very large problems, as their time complexity is worse than linear in the number of edges in the graph. This is an important issue for those interested in applying social network analysis techniques to very large networks, such as networks of mobile phone subscribers. In this respect the contributions of this report are two-fold. First we demonstrate these scaling issues using a prominent community-finding algorithm as a case study. We then show that a twostage process, whereby the network is first decomposed into manageable subnetworks using a multilevel graph partitioning procedure, is effective in finding communities in networks with more than 106 nodes.
  • Publication
    Detecting highly overlapping communities with Model-based Overlapping Seed Expansion
    (IEEE Computer Society, 2010-08) ;
    As research into community finding in social networks progresses, there is a need for algorithms capable of detecting overlapping community structure. Many algorithms have been proposed in recent years that are capable of assigning each node to more than a single community. The performance of these algorithms tends to degrade when the ground-truth contains a more highly overlapping community structure, with nodes assigned to more than two communities. Such highly overlapping structure is likely to exist in many social networks, such as Facebook friendship networks. In this paper we present a scalable algorithm, MOSES, based on a statistical model of community structure, which is capable of detecting highly overlapping community structure, especially when there is variance in the number of communities each node is in. In evaluation on synthetic data MOSES is found to be superior to existing algorithms, especially at high levels of overlap. We demonstrate MOSES on real social network data by analyzing the networks of friendship links between students of five US universities.
  • Publication
    Personalised Diversification Using Intent-Aware Portfolio
    The intent-aware diversification framework considers a set of aspects associated with items to be recommended. A baseline recommendation is greedily re-ranked using an objective that promotes diversity across the aspects. In this paper the framework is analysed and a new intent-aware objective is derived that considers the minimum variance criterion, connecting the framework directly to portfolio diversification from finance. We derive an aspect model that supports the goal of minimum variance and that is faithful to the underlying baseline algorithm. We evaluate diversification capabilities of the proposed method on the MovieLens dataset.
      446Scopus© Citations 6
  • Publication
    PDMFRec: A Decentralised Matrix Factorisation with Tunable User-centric Privacy
    Conventional approaches to matrix factorisation (MF) typically rely on a centralised collection of user data for building a MF model. This approach introduces an increased risk when it comes to user privacy. In this short paper we propose an alternative, user-centric, privacy enhanced, decentralised approach to MF. Our method pushes the computation of the recommendation model to the user’s device, and eliminates the need to exchange sensitive personal information; instead only the loss gradients of local (device-based) MF models need to be shared. Moreover, users can select the amount and type of information to be shared, for enhanced privacy. We demonstrate the effectiveness of this approach by considering different levels of user privacy in comparison with state-of-the-art alternatives.
      513Scopus© Citations 32
  • Publication
    Combining Rating and Review Data by Initializing Latent Factor Models with Topic Models for Top-N Recommendation
    Nowadays we commonly have multiple sources of data associated with items. Users may provide numerical ratings, or implicit interactions, but may also provide textual reviews. Although many algorithms have been proposed to jointly learn a model over both interactions and textual data, there is room to improve the many factorization models that are proven to work well on interactions data, but are not designed to exploit textual information. Our focus in this work is to propose a simple, yet easily applicable and effective, method to incorporate review data into such factorization models. In particular, we propose to build the user and item embeddings within the topic space of a topic model learned from the review data. This has several advantages: we observe that initializing the user and item embeddings in topic space leads to faster convergence of the factorization algorithm to a model that out-performs models initialized randomly, or with other state-of-the-art initialization strategies. Moreover, constraining user and item factors to topic space allows for the learning of an interpretable model that users can visualise.
      83Scopus© Citations 20
  • Publication
    Learning-to-Rank for Real-Time High-Precision Hashtag Recommendation for Streaming News
    We address the problem of real-time recommendation ofstreaming Twitter hashtags to an incoming stream of newsarticles. The technical challenge can be framed as largescale topic classication where the set of topics (i.e., hashtags)is huge and highly dynamic. Our main applicationscome from digital journalism, e.g., for promoting originalcontent to Twitter communities and for social indexing ofnews to enable better retrieval, story tracking and summarisation.In contrast to state-of-the-art methods that focus onmodelling each individual hashtag as a topic, we propose alearning-to-rank approach for modelling hashtag relevance,and present methods to extract time-aware features fromhighly dynamic content. We present the data collection andprocessing pipeline, as well as our methodology for achievinglow latency, high precision recommendations. Our empiricalresults show that our method outperforms the state-of-theart,delivering more than 80% precision. Our techniques areimplemented in a real-time system1, and are currently underuser trial with a big news organisation.
      2486Scopus© Citations 30