Options
Murphy, Thomas Brendan
Preferred name
Murphy, Thomas Brendan
Official Name
Murphy, Thomas Brendan
Research Output
Now showing 1 - 7 of 7
Publication
Variational Bayesian inference for the Latent Position Cluster Model
2009-12, Salter-Townshend, Michael, Murphy, Thomas Brendan
Many recent approaches to modeling social networks have focussed on embedding
the actors in a latent “social space”. Links are more likely for actors that are
close in social space than for actors that are distant in social space. In particular,
the Latent Position Cluster Model (LPCM) [1] allows for explicit modelling of
the clustering that is exhibited in many network datasets. However, inference for
the LPCM model via MCMC is cumbersome and scaling of this model to large
or even medium size networks with many interacting nodes is a challenge. Variational
Bayesian methods offer one solution to this problem. An approximate,
closed form posterior is formed, with unknown variational parameters. These
parameters are tuned to minimize the Kullback-Leibler divergence between the
approximate variational posterior and the true posterior, which known only up to
proportionality. The variational Bayesian approach is shown to give a computationally
efficient way of fitting the LPCM. The approach is demonstrated on a
number of data sets and it is shown to give a good fit.
Publication
Clustering ranked preference data using sociodemographic covariates
2010-01, Gormley, Isobel Claire, Murphy, Thomas Brendan
Ranked preference data arise when a set of judges rank, in order of their preference, a set of objects. Such data arise in preferential voting systems and market
research surveys. Covariate data associated with the judges are also often recorded.
Such covariate data should be used in conjunction with preference data when drawing inferences about judges.
To cluster a population of judges, the population is modelled as a collection
of homogeneous groups. The Plackett-Luce model for ranked data is employed to
model a judge’s ranked preferences within a group. A mixture of Plackett-Luce
models is employed to model the population of judges, where each component in
the mixture represents a group of judges.
Mixture of experts models provide a framework in which covariates are included
in mixture models. Covariates are included through the mixing proportions and
the component density parameters. A mixture of experts model for ranked preference data is developed by combining a mixture of experts model and a mixture of
Plackett-Luce models. Particular attention is given to the manner in which covariates enter the model. The mixing proportions and group specific parameters are potentially dependent on covariates. Model selection procedures are employed to
choose optimal models.
Model parameters are estimated via the ‘EMM algorithm’, a hybrid of the
Expectation-Maximization and the Minorization-Maximization algorithms. Examples are provided through a menu survey and through Irish election data. Results
indicate mixture modelling using covariates is insightful when examining a population of judges who express preferences.
Publication
Preferences in college applications - a nonparametric Bayesian analysis of top-10 rankings
2010-12-10, Ali, Alnur, Murphy, Thomas Brendan, Meila, Marina, Chen, Harr
Applicants to degree courses in Irish colleges and universities rank up to ten degree courses from a list of over five hundred. These data provide a wealth of
information concerning applicant degree choices. A Dirichlet process mixture of
generalized Mallows models are used to explore data from a cohort of applicants.
We find strong and diverse clusters, which in turn gains us important insights into
the workings of the system. No previously tried models or analysis technique are
able to model the data with comparable accuracy.
Publication
Review of Statistical Network Analysis: Models, Algorithms, and Software
2012-08, Salter-Townshend, Michael, White, Arthur, Gollini, Isabella, Murphy, Thomas Brendan
The analysis of network data is an area that is rapidly growing, both within and outside of the discipline of statistics.
This review provides a concise summary of methods and models used in the statistical analysis of network data, including the Erdos–Renyi model, the exponential family class of network models, and recently developed latent variable models. Many of the methods and models are illustrated by application to the well-known Zachary karate dataset. Software routines available for implementing methods are emphasized throughout.
The aim of this paper is to provide a review with enough detail about many common classes of network models to whet the appetite and to point the way to further reading.
Publication
Model-Based clustering of microarray expression data via latent Gaussian mixture models
2010-11-01, McNicholas, Paul D., Murphy, Thomas Brendan
In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation–maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets.
Publication
Model-based clustering of longitudinal data
2010-03, McNicholas, Paul D., Murphy, Thomas Brendan
A new family of mixture models for the model-based clustering of longitudinal data is introduced.
The covariance structures of eight members of this new family of models are given and the associated maximum likelihood estimates for the parameters are derived via expectation-maximization (EM) algorithms.
The Bayesian information criterion is used for model selection and a convergence criterion based on Aitken’s
acceleration is used to determine convergence of these EM algorithms. This new family of models is applied to yeast sporulation time course data, where the models give good clustering performance. Further
constraints are then imposed on the decomposition to allow a deeper investigation of correlation structure
of the yeast data. These constraints greatly extend this new family of models, with the addition of many
parsimonious models.