Now showing 1 - 5 of 5
  • Publication
    Combining Rating and Review Data by Initializing Latent Factor Models with Topic Models for Top-N Recommendation
    Nowadays we commonly have multiple sources of data associated with items. Users may provide numerical ratings, or implicit interactions, but may also provide textual reviews. Although many algorithms have been proposed to jointly learn a model over both interactions and textual data, there is room to improve the many factorization models that are proven to work well on interactions data, but are not designed to exploit textual information. Our focus in this work is to propose a simple, yet easily applicable and effective, method to incorporate review data into such factorization models. In particular, we propose to build the user and item embeddings within the topic space of a topic model learned from the review data. This has several advantages: we observe that initializing the user and item embeddings in topic space leads to faster convergence of the factorization algorithm to a model that out-performs models initialized randomly, or with other state-of-the-art initialization strategies. Moreover, constraining user and item factors to topic space allows for the learning of an interpretable model that users can visualise.
      61Scopus© Citations 19
  • Publication
    Reformulations of the Map Equation for Community Finding and Blockmodelling
    Among the many community-finding algorithms that have been proposed in the last decade and more, the Infomapalgorithm of Rosvall and Bergstrom has proven among the best. The algorithm finds good community structure in directed aswell as undirected networks by abstracting information flow inthe network as a random walk. In this paper, we reformulate the objective in terms of the Kullback-Leibler distance between thedistribution of the random walk transitions and that of a modelwalk. The choice of model can be used to constrain the typeof partition that the method extracts. This generalisation makesthe method suitable for extracting other types of meso-structurefrom the network, enabling the analyst to explicitly control thetype of extracted structure.
      316Scopus© Citations 2
  • Publication
    PDMFRec: A Decentralised Matrix Factorisation with Tunable User-centric Privacy
    Conventional approaches to matrix factorisation (MF) typically rely on a centralised collection of user data for building a MF model. This approach introduces an increased risk when it comes to user privacy. In this short paper we propose an alternative, user-centric, privacy enhanced, decentralised approach to MF. Our method pushes the computation of the recommendation model to the user’s device, and eliminates the need to exchange sensitive personal information; instead only the loss gradients of local (device-based) MF models need to be shared. Moreover, users can select the amount and type of information to be shared, for enhanced privacy. We demonstrate the effectiveness of this approach by considering different levels of user privacy in comparison with state-of-the-art alternatives.
      490Scopus© Citations 27
  • Publication
    Analysis of the Semi-synchronous Approach to Large-scale Parallel Community Finding
    Community-finding in graphs is the process of identifying highly cohesive vertex subsets. Recently the vertex-centric approach has been found effective for scalable graph processing and is implemented in systems such as Graph Lab and Pregel. In the vertex-centric approach, the analysis is decomposed into a set of local computations at each vertex of the graph, with results propagated to neighbours along the vertexs edges. Many community finding algorithms area menable to this approach as they are based on the optimisation of an objective through a process of iterative local update (ILU), in which vertices are successively moved to the community of one of their neighbours in order to achieve the highest local gain in the quality of the objective. The sequential processing of such iterative algorithms generally benefits from an asynchronous approach, where a vertex update uses the most recent state as generated by the previous update of vertices in its neighbourhood. When vertices are distributed over a parallel machine, the asynchronous approach can encounter race conditions that impact on its performance and destroy the consistency of the results. Alternatively,a semi-synchronous approach ensures that only non-conflicting vertices are updated simultaneously. In this paper we study the semi-synchronous approach to ILU algorithms for community finding on social networks. Because of the heavy-tailed vertex distribution, the order inwhich vertex updates are applied in asynchronous ILU can greatly impact both convergence time and quality of the found communities. We study the impact of ordering on the distributed label propagation and modularity maximisation algorithms implemented on a shared-memory multicore architecture.We demonstrate that the semi-synchronous ILU approach is competitive in time and quality with the asynchronous approach, while allowing the analyst to maintain consistent control over update ordering. Thus, our implementation results in a more robust and predictable performance and provides control over the order in which the node labels are updated, which is crucial to obtaining the correct trade-off between running time and quality of communities on many graph classes.
      407Scopus© Citations 3
  • Publication
    Engineering a Parallel ∆-stepping Algorithm
    Computation of the single-source shortest path (SSSP) is a fundamental primitive in many network analytics tasks. With the increasing size of networks to beanalysed, there is a need for effcient tools to compute shortest paths, especiallyon the widely adopted shared-memory multicore architectures. The ∆-steppingalgorithm, that trades-off the work effciency of Dijkstra’s algorithm with theparallelism offered by the Bellman-Ford algorithm, has been found to be among thefastest implementations on various parallel architectures. Despite its widespread popularity, the different design choices in implementing the parallel∆-steppingalgorithm are not properly understood and these design choices can have a significant impact on the final performance. In this paper, we carefully comparetwo different implementations of the∆-stepping algorithm for shared-memorymulticore architectures: (i) a static workload assignment where the nodes areassigned to threads at the beginning of the algorithm and only the assigned thread can relax edges leading to a node and (ii) a dynamic workload assignment wherethe nodes are dynamically allocated to threads at the time of bucket relaxation. Based on an extensive empirical study on a range of graph classes, edge density and weight distributions, we show that while the more intuitive and widely used approach of dynamically balanced workload suits dense power-law graphs well,the static partitioning approach outperforms this more intuitive approach on awide range of graph classes. Our findings can guide a network analyst in selecting the best parallel implementation of the ∆-stepping algorithm for a given analytics task and a given graph class.
      445Scopus© Citations 3