Now showing 1 - 10 of 12
  • Publication
    Latent space models for multiview network data
    (Institute of Mathematical Studies, 2017-09) ;
    Social relationships consist of interactions along multiple dimensions. In social networks, this means that individuals form multiple types of relationships with the same person (an individual will not trust all of his/her acquaintances, for example). Statistical models for these data require understanding two related types of dependence structure: (i) structure within each relationship type, or network view, and (ii) the association between views. In this paper we propose a statistical framework that parsimoniously represents dependence between relationship types while also maintaining enough flexibility to allow individuals to serve different roles in different relationship types. Our approach builds on work on latent space models for networks (see Hoff et al. (2002), for example). These models represent the propensity for two individuals to form edges as conditionally independent given the distance between the individuals in an unobserved social space. Our work departs from previous work in this area by representing dependence structure between network views through a Multivariate Bernoulli likelihood, providing a representation of between-view association. This approach infers correlations between views not explained by the latent space model. Using our method, we explore 6 multiview network structures across 75 villages in rural southern Karnataka, India (Banerjee et al., 2013).
      330Scopus© Citations 27
  • Publication
    The influence of network structures of Wikipedia discussion pages on the efficiency of WikiProjects
    The proliferation of online communities has attracted much attention to modelling user behaviour in terms of social interaction, language adoption and contribution activity. Nevertheless, when applied to large-scale and cross-platform behavioural data, existing approaches generally suffer from expressiveness, scalability and generality issues. This paper proposes trans-dimensional von Mises-Fisher (TvMF) mixture models for L2 normalised behavioural data, which encapsulate: (1) a Bayesian framework for vMF mixtures that enables prior knowledge and information sharing among clusters, (2) an extended version of reversible jump MCMC algorithm that allows adaptive changes in the number of clusters for vMF mixtures when the model parameters are updated, and (3) an online TvMF mixture model that accommodates the dynamics of clusters for time-varying user behavioural data. We develop efficient collapsed Gibbs sampling techniques for posterior inference, which facilitates parallelism for parameter updates. Empirical results on simulated and real-world data show that the proposed TvMF mixture models can discover more interpretable and intuitive clusters than other widely-used models, such as k-means, non-negative matrix factorization (NMF), Dirichlet process Gaussian mixture models (DP-GMM), and dynamic topic models (DTM). We further evaluate the performance of proposed models in real-world applications, such as the churn prediction task, that shows the usefulness of the features generated.
      585Scopus© Citations 14
  • Publication
    Role Analysis in Networks Using Mixtures of Exponential Random Graph Models
    This article introduces a novel and flexible framework for investigating the roles of actors within a network. Particular interest is in roles as defined by local network connectivity patterns, identified using the ego-networks extracted from the network. A mixture of exponential-family random graph models (ERGM) is developed for these ego-networks to cluster the nodes into roles. We refer to this model as the ego-ERGM. An expectation-maximization algorithm is developed to infer the unobserved cluster assignments and to estimate the mixture model parameters using a maximum pseudo-likelihood approximation. We demonstrate the flexibility and utility of the method using examples of simulated and real networks.
      401Scopus© Citations 18
  • Publication
    Mixtures of biased sentiment analysers
    Modelling bias is an important consideration when dealing with inexpert annotations. We are concerned with training a classifier to perform sentiment analysis on news media articles, some of which have been manually annotated by volunteers. The classifier is trained on the words in the articles and then applied to non-annotated articles. In previous work we found that a joint estimation of the annotator biases and the classifier parameters performed better than estimation of the biases followed by training of the classifier. An important question follows from this result: can the annotators be usefully clustered into either predetermined or data-driven clusters, based on their biases? If so, such a clustering could be used to select, drop or otherwise categorise the annotators in a crowdsourcing task. This paper presents work on fitting a finite mixture model to the annotators’ bias. We develop a model and an algorithm and demonstrate its properties on simulated data. We then demonstrate the clustering that exists in our motivating dataset, namely the analysis of potentially economically relevant news articles from Irish online news sources.
      304Scopus© Citations 3
  • Publication
    Sentiment analysis of online media
    A joint model for annotation bias and document classification is presented in the context of media sentiment analysis. We consider an Irish online media data set comprising online news articles with user annotations of negative, positive or irrelevant impact on the Irish economy. The joint model combines a statistical model for user annotation bias and a Naive Bayes model for the document terms. An EM algorithm is used to estimate the annotation bias model, the unobserved biases in the user annotations, the classifier parameters and the sentiment of the articles. The joint modeling of both the user biases and the classifier is demonstrated to be superior to estimation of the bias followed by the estimation of the classifier parameters.
      1049
  • Publication
    Variational Bayesian inference for the Latent Position Cluster Model for network data
    A number of recent approaches to modeling social networks have focussed on embedding the nodes in a latent “social space”. Nodes that are in close proximity are more likely to form links than those who are distant. This naturally accounts for reciprocal and transitive relationships which are commonly found in many network datasets. The Latent Position Cluster Model is one such model that also explicitly incorporates clustering by modeling the locations using a finite Gaussian mixture model. Observed covariates and sociality random effects may also be modeled. However, inference for the model via MCMC is cumbersome and thus scaling to large networks is a challenge. Variational Bayesian methods offer an alternative inference methodology for this problem. Sampling based MCMC is replaced by an optimization that requires many orders of magnitude fewer iterations to converge. A Variational Bayesian algorithm for the Latent Position Cluster Model is therefore developed and demonstrated.
      218Scopus© Citations 50
  • Publication
    Review of Statistical Network Analysis: Models, Algorithms, and Software
    The analysis of network data is an area that is rapidly growing, both within and outside of the discipline of statistics. This review provides a concise summary of methods and models used in the statistical analysis of network data, including the Erdos–Renyi model, the exponential family class of network models, and recently developed latent variable models. Many of the methods and models are illustrated by application to the well-known Zachary karate dataset. Software routines available for implementing methods are emphasized throughout. The aim of this paper is to provide a review with enough detail about many common classes of network models to whet the appetite and to point the way to further reading.
      9477Scopus© Citations 83
  • Publication
    Variational Bayesian inference for the Latent Position Cluster Model
    Many recent approaches to modeling social networks have focussed on embedding the actors in a latent “social space”. Links are more likely for actors that are close in social space than for actors that are distant in social space. In particular, the Latent Position Cluster Model (LPCM) [1] allows for explicit modelling of the clustering that is exhibited in many network datasets. However, inference for the LPCM model via MCMC is cumbersome and scaling of this model to large or even medium size networks with many interacting nodes is a challenge. Variational Bayesian methods offer one solution to this problem. An approximate, closed form posterior is formed, with unknown variational parameters. These parameters are tuned to minimize the Kullback-Leibler divergence between the approximate variational posterior and the true posterior, which known only up to proportionality. The variational Bayesian approach is shown to give a computationally efficient way of fitting the LPCM. The approach is demonstrated on a number of data sets and it is shown to give a good fit.
      772
  • Publication
    Exploring the Relationship between Membership Turnover and Productivity in Online Communities
    (Association for the Advancement of Artificial Intelligence, 2014-06-04) ; ;
    One of the more disruptive reforms associated with the modern Internet is the emergence of online communities working together on knowledge artefacts such as Wikipedia and OpenStreetMap. Recently it has become clear that these initiatives are vulnerable because of problems with membership turnover. This study presents a longitudinal analysis of 891 Wiki Projects where we model the impact of member turnover and social capital losses on project productivity. By examining social capital losses we attempt to provide a more nuanced analysis of member turnover. In this context social capital is modelled from a social network perspective where the loss of more central members has more impact. We find that only a small proportion of Wiki Projects are in a relatively healthy state with low levels of membership turnover and social capital losses.The results show that the relationship between social capital losses and project performance is U-shaped, and that member withdrawal has significant negative effect on project outcomes. The results also support the mediation of turnover rate and network density on the curvilinear relationship.
      161
  • Publication
    Sentiment Analysis of Online Media
    A joint model for annotation bias and document classification is presented in the context of media sentiment analysis. We consider an Irish online media data set comprising online news articles with user annotations of negative, positive or irrelevant impact on the Irish economy. The joint model combines a statistical model for user annotation bias and a Naive Bayes model for the document terms. An EM algorithm is used to estimate the annotation bias model, the unobserved biases in the user annotations, the classifier parameters and the sentiment of the articles. The joint modeling of both the user biases and the classifier is demonstrated to be superior to estimation of the bias followed by the estimation of the classifier parameters.
      635