Options
Salter-Townshend, Michael
Preferred name
Salter-Townshend, Michael
Official Name
Salter-Townshend, Michael
Research Output
Now showing 1 - 10 of 12
- PublicationVariational Bayesian inference for the Latent Position Cluster Model(2009-12)
; Many recent approaches to modeling social networks have focussed on embedding the actors in a latent “social space”. Links are more likely for actors that are close in social space than for actors that are distant in social space. In particular, the Latent Position Cluster Model (LPCM) [1] allows for explicit modelling of the clustering that is exhibited in many network datasets. However, inference for the LPCM model via MCMC is cumbersome and scaling of this model to large or even medium size networks with many interacting nodes is a challenge. Variational Bayesian methods offer one solution to this problem. An approximate, closed form posterior is formed, with unknown variational parameters. These parameters are tuned to minimize the Kullback-Leibler divergence between the approximate variational posterior and the true posterior, which known only up to proportionality. The variational Bayesian approach is shown to give a computationally efficient way of fitting the LPCM. The approach is demonstrated on a number of data sets and it is shown to give a good fit.710 - PublicationA latent space mapping for link prediction(2010-12-11)
; Network modeling can be approached using either discriminative or probabilistic models. In the task of link prediction a probabilistic model will give a probability for the existence of a link; while in some scenarios this may be beneficial, in others a hard discriminative boundary needs to be set. Hence the use of a discriminative classifier is preferable. In domains such as image analysis and speaker recognition, probabilistic models have been used as a mechanism from which features can be extracted. This paper examines using a probabilistic model built on the entire graph to extract features to predict the existence of unknown links between two nodes. It demonstrates how features extracted from the model as well as the predicted probability of a link existing can aid the classification process.295 - PublicationSentiment analysis of online mediaA joint model for annotation bias and document classification is presented in the context of media sentiment analysis. We consider an Irish online media data set comprising online news articles with user annotations of negative, positive or irrelevant impact on the Irish economy. The joint model combines a statistical model for user annotation bias and a Naive Bayes model for the document terms. An EM algorithm is used to estimate the annotation bias model, the unobserved biases in the user annotations, the classifier parameters and the sentiment of the articles. The joint modeling of both the user biases and the classifier is demonstrated to be superior to estimation of the bias followed by the estimation of the classifier parameters.
904 - PublicationExploring the Relationship between Membership Turnover and Productivity in Online Communities(Association for the Advancement of Artificial Intelligence, 2014-06-04)
; ; One of the more disruptive reforms associated with the modern Internet is the emergence of online communities working together on knowledge artefacts such as Wikipedia and OpenStreetMap. Recently it has become clear that these initiatives are vulnerable because of problems with membership turnover. This study presents a longitudinal analysis of 891 Wiki Projects where we model the impact of member turnover and social capital losses on project productivity. By examining social capital losses we attempt to provide a more nuanced analysis of member turnover. In this context social capital is modelled from a social network perspective where the loss of more central members has more impact. We find that only a small proportion of Wiki Projects are in a relatively healthy state with low levels of membership turnover and social capital losses.The results show that the relationship between social capital losses and project performance is U-shaped, and that member withdrawal has significant negative effect on project outcomes. The results also support the mediation of turnover rate and network density on the curvilinear relationship.101 - PublicationSentiment Analysis of Online MediaA joint model for annotation bias and document classification is presented in the context of media sentiment analysis. We consider an Irish online media data set comprising online news articles with user annotations of negative, positive or irrelevant impact on the Irish economy. The joint model combines a statistical model for user annotation bias and a Naive Bayes model for the document terms. An EM algorithm is used to estimate the annotation bias model, the unobserved biases in the user annotations, the classifier parameters and the sentiment of the articles. The joint modeling of both the user biases and the classifier is demonstrated to be superior to estimation of the bias followed by the estimation of the classifier parameters.
538 - PublicationOnline Trans-dimensional von Mises-Fisher Mixture Models for User Profiles(Journal of Machine Learning Research, 2016)
; ; The proliferation of online communities has attracted much attention to modelling user behaviour in terms of social interaction, language adoption and contribution activity. Nevertheless, when applied to large-scale and cross-platform behavioural data, existing approaches generally suffer from expressiveness, scalability and generality issues. This paper proposes trans-dimensional von Mises-Fisher (TvMF) mixture models for L2 normalised behavioural data, which encapsulate: (1)a Bayesian framework for vMF mixtures that enables prior knowledge and information sharing among clusters, (2) an extended version of reversible jump MCMC algorithm that allows adaptivechanges in the number of clusters for vMF mixtures when the model parameters are updated, and (3)an online TvMF mixture model that accommodates the dynamics of clusters for time-varying user behavioural data. We develop efficient collapsed Gibbs sampling techniques for posterior inference,which facilitates parallelism for parameter updates. Empirical results on simulated and real-world data show that the proposed TvMF mixture models can discover more interpretable and intuitive clusters than other widely-used models, such as k-means, non-negative matrix factorization (NMF), Dirichlet process Gaussian mixture models (DP-GMM), and dynamic topic models (DTM). Wefurther evaluate the performance of proposed models in real-world applications, such as the churn prediction task, that shows the usefulness of the features generated.316 - PublicationRole Analysis in Networks Using Mixtures of Exponential Random Graph ModelsThis article introduces a novel and flexible framework for investigating the roles of actors within a network. Particular interest is in roles as defined by local network connectivity patterns, identified using the ego-networks extracted from the network. A mixture of exponential-family random graph models (ERGM) is developed for these ego-networks to cluster the nodes into roles. We refer to this model as the ego-ERGM. An expectation-maximization algorithm is developed to infer the unobserved cluster assignments and to estimate the mixture model parameters using a maximum pseudo-likelihood approximation. We demonstrate the flexibility and utility of the method using examples of simulated and real networks.
314Scopus© Citations 16 - PublicationReview of Statistical Network Analysis: Models, Algorithms, and Software(Wiley-Blackwell, 2012-08)
; ; ; The analysis of network data is an area that is rapidly growing, both within and outside of the discipline of statistics. This review provides a concise summary of methods and models used in the statistical analysis of network data, including the Erdos–Renyi model, the exponential family class of network models, and recently developed latent variable models. Many of the methods and models are illustrated by application to the well-known Zachary karate dataset. Software routines available for implementing methods are emphasized throughout. The aim of this paper is to provide a review with enough detail about many common classes of network models to whet the appetite and to point the way to further reading.9146Scopus© Citations 80 - PublicationLatent space models for multiview network dataSocial relationships consist of interactions along multiple dimensions. In social networks, this means that individuals form multiple types of relationships with the same person (an individual will not trust all of his/her acquaintances, for example). Statistical models for these data require understanding two related types of dependence structure: (i) structure within each relationship type, or network view, and (ii) the association between views. In this paper we propose a statistical framework that parsimoniously represents dependence between relationship types while also maintaining enough flexibility to allow individuals to serve different roles in different relationship types. Our approach builds on work on latent space models for networks (see Hoff et al. (2002), for example). These models represent the propensity for two individuals to form edges as conditionally independent given the distance between the individuals in an unobserved social space. Our work departs from previous work in this area by representing dependence structure between network views through a Multivariate Bernoulli likelihood, providing a representation of between-view association. This approach infers correlations between views not explained by the latent space model. Using our method, we explore 6 multiview network structures across 75 villages in rural southern Karnataka, India (Banerjee et al., 2013).
285Scopus© Citations 18 - PublicationThe influence of network structures of Wikipedia discussion pages on the efficiency of WikiProjectsThe proliferation of online communities has attracted much attention to modelling user behaviour in terms of social interaction, language adoption and contribution activity. Nevertheless, when applied to large-scale and cross-platform behavioural data, existing approaches generally suffer from expressiveness, scalability and generality issues. This paper proposes trans-dimensional von Mises-Fisher (TvMF) mixture models for L2 normalised behavioural data, which encapsulate: (1) a Bayesian framework for vMF mixtures that enables prior knowledge and information sharing among clusters, (2) an extended version of reversible jump MCMC algorithm that allows adaptive changes in the number of clusters for vMF mixtures when the model parameters are updated, and (3) an online TvMF mixture model that accommodates the dynamics of clusters for time-varying user behavioural data. We develop efficient collapsed Gibbs sampling techniques for posterior inference, which facilitates parallelism for parameter updates. Empirical results on simulated and real-world data show that the proposed TvMF mixture models can discover more interpretable and intuitive clusters than other widely-used models, such as k-means, non-negative matrix factorization (NMF), Dirichlet process Gaussian mixture models (DP-GMM), and dynamic topic models (DTM). We further evaluate the performance of proposed models in real-world applications, such as the churn prediction task, that shows the usefulness of the features generated.
516Scopus© Citations 10