Now showing 1 - 10 of 31
  • Publication
    Learning-to-Rank for Real-Time High-Precision Hashtag Recommendation for Streaming News
    We address the problem of real-time recommendation ofstreaming Twitter hashtags to an incoming stream of newsarticles. The technical challenge can be framed as largescale topic classication where the set of topics (i.e., hashtags)is huge and highly dynamic. Our main applicationscome from digital journalism, e.g., for promoting originalcontent to Twitter communities and for social indexing ofnews to enable better retrieval, story tracking and summarisation.In contrast to state-of-the-art methods that focus onmodelling each individual hashtag as a topic, we propose alearning-to-rank approach for modelling hashtag relevance,and present methods to extract time-aware features fromhighly dynamic content. We present the data collection andprocessing pipeline, as well as our methodology for achievinglow latency, high precision recommendations. Our empiricalresults show that our method outperforms the state-of-theart,delivering more than 80% precision. Our techniques areimplemented in a real-time system1, and are currently underuser trial with a big news organisation.
    Scopus© Citations 30  2446
  • Publication
    Structural Hole Centrality: Evaluating Social Capital through Strategic Network Formation
    (Springer, 2020-09-17) ;
    Strategic network formation is a branch of network science that takes an economic perspective to the creation of social networks, considering that actors in a network form links in order to maximise some utility that they attain through their connections to other actors in the network. In particular, Jackson’s Connections model, writes an actor’s utility as a sum over all other actors that can be reached along a path in the network of a benefit value that diminishes with the path length. In this paper, we are interested in the “social capital” that an actor retains due to their position in the network. Social capital can be understood as an ability to bond with actors, as well as an ability to form a bridge that connects otherwise disconnected actors. This bridging benefit has previously been modelled in another “structural hole” network formation game, proposed by Kleinberg. In this paper, we develop an approach that generalises the utility of Kleinberg’s game and combines it with that of the Connections model, to create a utility that models both the bonding and bridging capabilities of an actor with social capital. From this utility and its associated formation game, we derive a new centrality measure, which we dub “structural hole centrality”, to identify actors with high social capital. We analyse this measure by applying it to networks of different types, and assessing its correlation to other centrality metrics, using a benchmark dataset of 299 networks, drawn from different domains. Finally, using one social network from the dataset, we illustrate how an actor’s “structural hole centrality profile” can be used to identify their bridging and bonding value to the network.
    Scopus© Citations 10  27
  • Publication
    Overlapping Stochastic Community Finding
    Community finding in social network analysis is the task of identifying groups of people within a larger population who are more likely to connect to each other than connect to others in the population. Much existing research has focussed on non-overlapping clustering. However, communities in real world social networks do overlap. This paper introduces a new community finding method based on overlapping clustering. A Bayesian statistical model is presented, and a Markov Chain Monte Carlo (MCMC) algorithm is presented and evaluated in comparison with two existing overlapping community finding methods that are applicable to large networks. We evaluate our algorithm on networks with thousands of nodes and tens of thousands of edges.
    Scopus© Citations 3  361
  • Publication
    PDMFRec: A Decentralised Matrix Factorisation with Tunable User-centric Privacy
    Conventional approaches to matrix factorisation (MF) typically rely on a centralised collection of user data for building a MF model. This approach introduces an increased risk when it comes to user privacy. In this short paper we propose an alternative, user-centric, privacy enhanced, decentralised approach to MF. Our method pushes the computation of the recommendation model to the user’s device, and eliminates the need to exchange sensitive personal information; instead only the loss gradients of local (device-based) MF models need to be shared. Moreover, users can select the amount and type of information to be shared, for enhanced privacy. We demonstrate the effectiveness of this approach by considering different levels of user privacy in comparison with state-of-the-art alternatives.
    Scopus© Citations 25  463
  • Publication
    Personalised Diversification Using Intent-Aware Portfolio
    The intent-aware diversification framework considers a set of aspects associated with items to be recommended. A baseline recommendation is greedily re-ranked using an objective that promotes diversity across the aspects. In this paper the framework is analysed and a new intent-aware objective is derived that considers the minimum variance criterion, connecting the framework directly to portfolio diversification from finance. We derive an aspect model that supports the goal of minimum variance and that is faithful to the underlying baseline algorithm. We evaluate diversification capabilities of the proposed method on the MovieLens dataset.
      421Scopus© Citations 6
  • Publication
    Themecrowds : multiresolution summaries of Twitter usage
    Users of social media sites, such as Twitter, rapidly generate large volumes of text content on a daily basis. Visual summaries are needed to understand what groups of people are saying collectively in this unstructured text data. Users will typically discuss a wide variety of topics, where the number of authors talking about a specific topic can quickly grow or diminish over time, and what the collective is saying about the subject can shift as a situation develops. In this paper, we present a technique that summarises what collections of Twitter users are saying about certain topics over time. As the correct resolution for inspecting the data is unknown in advance, the users are clustered hierarchically over a fixed time interval based on the similarity of their posts. The visualisation technique takes this data structure as its input. Given a topic, it finds the correct resolution of users at each time interval and provides tags to summarise what the collective is discussing. The technique is tested on a large microblogging corpus, consisting of millions of tweets and over a million users.
    Scopus© Citations 35  511
  • Publication
    Exploring Tweet Engagement in the RecSys 2014 Data Challenge
    While much recommender system research has been driven by the rating prediction task, there is an emphasis in recent research on exploring new methods to evaluate the effectiveness of a recommendation. The Recommender Systems Challenge 2014 takes up this theme by challenging re-searchers to explore engagement as an evaluation criterion.In this paper we discuss how predicting engagement differs from the traditional rating prediction task and motivate the rationale behind our approach to the challenge. We show that standard matrix factorization recommender algorithms do not perform well on the task. Our solution depends on clustering items according to their time-dependent profile to distinguish topical movies from other movies. Our pre-diction engine also exploits the observation that extreme ratings are more likely to attract engagement.
  • Publication
    Evaluation of Hierarchical Clustering via Markov Decision Processes for Efficient Navigation and Search
    In this paper, we propose a new evaluation measure to assessthe quality of a hierarchy in supporting search queries to content collections.The evaluation measure models the scenario of a searcher seeking a particular target item in the hierarchy. It takes into account the structureof the hierarchy by measuring the cognitive challenge of determiningthe correct path in the hierarchy as well as the reduction in search timeaorded by hierarchy. The goal is to propose a general-purpose measurethat can be applied in dierent application contexts, allowing dierenthierarchical arrangements of content to be quantitatively assessed
      543Scopus© Citations 2
  • Publication
    Engineering a Parallel ∆-stepping Algorithm
    Computation of the single-source shortest path (SSSP) is a fundamental primitive in many network analytics tasks. With the increasing size of networks to beanalysed, there is a need for effcient tools to compute shortest paths, especiallyon the widely adopted shared-memory multicore architectures. The ∆-steppingalgorithm, that trades-off the work effciency of Dijkstra’s algorithm with theparallelism offered by the Bellman-Ford algorithm, has been found to be among thefastest implementations on various parallel architectures. Despite its widespread popularity, the different design choices in implementing the parallel∆-steppingalgorithm are not properly understood and these design choices can have a significant impact on the final performance. In this paper, we carefully comparetwo different implementations of the∆-stepping algorithm for shared-memorymulticore architectures: (i) a static workload assignment where the nodes areassigned to threads at the beginning of the algorithm and only the assigned thread can relax edges leading to a node and (ii) a dynamic workload assignment wherethe nodes are dynamically allocated to threads at the time of bucket relaxation. Based on an extensive empirical study on a range of graph classes, edge density and weight distributions, we show that while the more intuitive and widely used approach of dynamically balanced workload suits dense power-law graphs well,the static partitioning approach outperforms this more intuitive approach on awide range of graph classes. Our findings can guide a network analyst in selecting the best parallel implementation of the ∆-stepping algorithm for a given analytics task and a given graph class.
      391Scopus© Citations 3
  • Publication
    Intent-aware Item-based Collaborative Filtering for Personalised Diversification
    Diversity has been identified as one of the key dimensions of recommendation utility that should be considered besides the overall accuracy of the system. A common diversification approach is to rerank results produced by a baseline recommendation engine according to a diversification criterion. The intent-aware framework is one of the frameworks that has been proposed for recommendations diversification. It assumes existence of a set of aspects associated with items, which also represent user intentions, and the framework promotes diversity across the aspects to address user expectations more accurately. In this paper we consider item-based collaborative filtering and suggest that the traditional view of item similarity is lacking a user perspective. We argue that user preferences towards different aspects should be reflected in recommendations produced by the system. We incorporate the intent-aware framework into the item-based recommendation algorithm by injecting personalised intent-aware covariance into the item similarity measure, and explore the impact of such change on the performance of the algorithm. Our experiments show that the proposed method improves both accuracy and diversity of recommendations, offering better accuracy/ diversity tradeoff than existing solutions.
    Scopus© Citations 15  585