Now showing 1 - 7 of 7
Thumbnail Image
Publication

Variational Bayesian inference for the Latent Position Cluster Model

2009-12, Salter-Townshend, Michael, Murphy, Thomas Brendan

Many recent approaches to modeling social networks have focussed on embedding the actors in a latent “social space”. Links are more likely for actors that are close in social space than for actors that are distant in social space. In particular, the Latent Position Cluster Model (LPCM) [1] allows for explicit modelling of the clustering that is exhibited in many network datasets. However, inference for the LPCM model via MCMC is cumbersome and scaling of this model to large or even medium size networks with many interacting nodes is a challenge. Variational Bayesian methods offer one solution to this problem. An approximate, closed form posterior is formed, with unknown variational parameters. These parameters are tuned to minimize the Kullback-Leibler divergence between the approximate variational posterior and the true posterior, which known only up to proportionality. The variational Bayesian approach is shown to give a computationally efficient way of fitting the LPCM. The approach is demonstrated on a number of data sets and it is shown to give a good fit.

Thumbnail Image
Publication

Clustering ranked preference data using sociodemographic covariates

2010-01, Gormley, Isobel Claire, Murphy, Thomas Brendan

Ranked preference data arise when a set of judges rank, in order of their preference, a set of objects. Such data arise in preferential voting systems and market research surveys. Covariate data associated with the judges are also often recorded. Such covariate data should be used in conjunction with preference data when drawing inferences about judges. To cluster a population of judges, the population is modelled as a collection of homogeneous groups. The Plackett-Luce model for ranked data is employed to model a judge’s ranked preferences within a group. A mixture of Plackett-Luce models is employed to model the population of judges, where each component in the mixture represents a group of judges. Mixture of experts models provide a framework in which covariates are included in mixture models. Covariates are included through the mixing proportions and the component density parameters. A mixture of experts model for ranked preference data is developed by combining a mixture of experts model and a mixture of Plackett-Luce models. Particular attention is given to the manner in which covariates enter the model. The mixing proportions and group specific parameters are potentially dependent on covariates. Model selection procedures are employed to choose optimal models. Model parameters are estimated via the ‘EMM algorithm’, a hybrid of the Expectation-Maximization and the Minorization-Maximization algorithms. Examples are provided through a menu survey and through Irish election data. Results indicate mixture modelling using covariates is insightful when examining a population of judges who express preferences.

Thumbnail Image
Publication

Mixture of experts modelling with social science applications

2011-04, Gormley, Isobel Claire, Murphy, Thomas Brendan

Thumbnail Image
Publication

Preferences in college applications - a nonparametric Bayesian analysis of top-10 rankings

2010-12-10, Ali, Alnur, Murphy, Thomas Brendan, Meila, Marina, Chen, Harr

Applicants to degree courses in Irish colleges and universities rank up to ten degree courses from a list of over five hundred. These data provide a wealth of information concerning applicant degree choices. A Dirichlet process mixture of generalized Mallows models are used to explore data from a cohort of applicants. We find strong and diverse clusters, which in turn gains us important insights into the workings of the system. No previously tried models or analysis technique are able to model the data with comparable accuracy.

Thumbnail Image
Publication

Review of Statistical Network Analysis: Models, Algorithms, and Software

2012-08, Salter-Townshend, Michael, White, Arthur, Gollini, Isabella, Murphy, Thomas Brendan

The analysis of network data is an area that is rapidly growing, both within and outside of the discipline of statistics. This review provides a concise summary of methods and models used in the statistical analysis of network data, including the Erdos–Renyi model, the exponential family class of network models, and recently developed latent variable models. Many of the methods and models are illustrated by application to the well-known Zachary karate dataset. Software routines available for implementing methods are emphasized throughout. The aim of this paper is to provide a review with enough detail about many common classes of network models to whet the appetite and to point the way to further reading.

Thumbnail Image
Publication

Model-Based clustering of microarray expression data via latent Gaussian mixture models

2010-11-01, McNicholas, Paul D., Murphy, Thomas Brendan

In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation–maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets.

Thumbnail Image
Publication

Model-based clustering of longitudinal data

2010-03, McNicholas, Paul D., Murphy, Thomas Brendan

A new family of mixture models for the model-based clustering of longitudinal data is introduced. The covariance structures of eight members of this new family of models are given and the associated maximum likelihood estimates for the parameters are derived via expectation-maximization (EM) algorithms. The Bayesian information criterion is used for model selection and a convergence criterion based on Aitken’s acceleration is used to determine convergence of these EM algorithms. This new family of models is applied to yeast sporulation time course data, where the models give good clustering performance. Further constraints are then imposed on the decomposition to allow a deeper investigation of correlation structure of the yeast data. These constraints greatly extend this new family of models, with the addition of many parsimonious models.