Options
Murphy, Thomas Brendan
Preferred name
Murphy, Thomas Brendan
Official Name
Murphy, Thomas Brendan
Research Output
Now showing 1 - 10 of 44
Publication
Variational Bayesian inference for the Latent Position Cluster Model
2009-12, Salter-Townshend, Michael, Murphy, Thomas Brendan
Many recent approaches to modeling social networks have focussed on embedding
the actors in a latent “social space”. Links are more likely for actors that are
close in social space than for actors that are distant in social space. In particular,
the Latent Position Cluster Model (LPCM) [1] allows for explicit modelling of
the clustering that is exhibited in many network datasets. However, inference for
the LPCM model via MCMC is cumbersome and scaling of this model to large
or even medium size networks with many interacting nodes is a challenge. Variational
Bayesian methods offer one solution to this problem. An approximate,
closed form posterior is formed, with unknown variational parameters. These
parameters are tuned to minimize the Kullback-Leibler divergence between the
approximate variational posterior and the true posterior, which known only up to
proportionality. The variational Bayesian approach is shown to give a computationally
efficient way of fitting the LPCM. The approach is demonstrated on a
number of data sets and it is shown to give a good fit.
Publication
Standardizing interestingness measures for association rules
2018-12, Shaikh, Mateen, McNicholas, Paul D., Antonie, M. Luiza, Murphy, Thomas Brendan
Interestingness measures provide information about association rules. The value of an interestingness measure is often interpreted relative to the overall range of the interestingness measure. However, properties of individual association rules can further restrict what value an interestingness measure can achieve. These additional constraints are not typically taken into account in analysis, potentially misleading the investigator. Considering the value of an interestingness measure relative to this further constrained range provides greater insight than the original range alone and can even alter researchers' impressions of the data. Standardizing interestingness measures takes these additional restrictions into account, resulting in values that provide a relative measure of the attainable values. We explore the impacts of standardizing interestingness measures on real and simulated data.
Publication
Preferences in college applications - a nonparametric Bayesian analysis of top-10 rankings
2010-12-10, Ali, Alnur, Murphy, Thomas Brendan, Meila, Marina, Chen, Harr
Applicants to degree courses in Irish colleges and universities rank up to ten degree courses from a list of over five hundred. These data provide a wealth of
information concerning applicant degree choices. A Dirichlet process mixture of
generalized Mallows models are used to explore data from a cohort of applicants.
We find strong and diverse clusters, which in turn gains us important insights into
the workings of the system. No previously tried models or analysis technique are
able to model the data with comparable accuracy.
Publication
Semi-supervised linear discriminant analysis
2011-12, Toher, Deirdre, Downey, Gerard, Murphy, Thomas Brendan
Fisher's linear discriminant analysis is one of the most commonly used and studied classification methods in chemometrics. The method finds a projection of multivariate data into a lower dimensional space so that the groups in the data are well separated. The resulting projected values are subsequently used to classify unlabeled observations into the groups.
A semi-supervised version of Fisher's linear discriminant analysis is developed, so that the unlabeled observations are also used in the model fitting procedure. This approach is advantageous when few labeled and many unlabeled observations are available.
The semi-supervised linear discriminant analysis method is demonstrated on a number of data sets where it is shown to yield better separation of the groups and improved classification over Fisher's linear discriminant analysis.
Publication
A robust approach to model-based classification based on trimming and constraints
2019-08-14, Cappozzo, Andrea, Greselin, Francesca, Murphy, Thomas Brendan
In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method.
Publication
A Mixture of Experts Latent Position Cluster Model for Social Network Data
2010-05, Gormley, Isobel Claire, Murphy, Thomas Brendan
Social network data represent the interactions between a group of social actors. Interactions between colleagues and friendship networks are typical examples of such data. The latent space model for social network data locates each actor in a network in a latent (social) space and models the probability of an interaction between two actors as a function of their locations. The latent position cluster model extends the latent space model to deal with network data in which clusters of actors exist — actor locations are drawn from a finite mixture model, each component of which represents a cluster of actors. A mixture of experts model builds on the structure of a mixture model by taking account of both observations and associated covariates when modeling a heterogeneous population. Herein, a mixture of experts extension of the latent position cluster model is developed. The mixture of experts framework allows covariates to enter the latent position cluster model in a number of ways, yielding different model interpretations. Estimates of the model parameters are derived in a Bayesian framework using a Markov Chain Monte Carlo algorithm. The algorithm is generally computationally expensive — surrogate proposal distributions which shadow the target distributions are derived, reducing the computational burden. The methodology is demonstrated through an illustrative example detailing relationships between a group of lawyers in the USA.
Publication
Model-Based clustering of microarray expression data via latent Gaussian mixture models
2010-11-01, McNicholas, Paul D., Murphy, Thomas Brendan
In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation–maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets.
Publication
A grade of membership model for rank data
2009-06, Gormley, Isobel Claire, Murphy, Thomas Brendan
A grade of membership (GoM) model is an individual level mixture model which allows individuals have partial membership of the groups that characterize a population. A GoM model for rank data is developed to model the particular case when the response data is ranked in nature. A Metropolis-withinGibbs sampler provides the framework for model fitting, but the intricate nature of the rank data models makes the selection of suitable proposal distributions difficult. 'Surrogate' proposal distributions are constructed using ideas from optimization transfer algorithms. Model fitting issues such as label switching and model selection are also addressed. The GoM model for rank data is illustrated through an analysis of Irish election data where voters rank some or all of the candidates in order of preference. Interest lies in highlighting distinct groups of voters with similar preferences (i.e. 'voting blocs') within the electorate, taking into account the rank nature of the response data, and in examining individuals’ voting bloc memberships. The GoM model for rank data is fitted to data from an opinion poll conducted during the Irish presidential election campaign in 1997.
Publication
Bayesian Nonparametric Plackett-Luce Models for the Analysis of Preferences for College Degree Programmes
2014, Caron, François, Whye Teh, Yee, Murphy, Thomas Brendan
In this paper we propose a Bayesian nonparametric model for clustering partial ranking data.We start by developing a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with prior specified by a completely random measure. We characterise the posterior distribution given data, and derive a simple and effective Gibbs sampler for posterior simulation. We then develop a Dirichlet process mixture extension of our model and apply it to investigate the clustering of preferences for college degree programmes amongst Irish secondary school graduates. The existence of clusters of applicants who have similar preferences for degree programmes is established and we determine that subject matter and geographical location of the third level institution characterise these clusters.
Publication
Motor insurance claim modelling with factor collapsing and Bayesian model averaging
2018-03-26, Hu, Sen, O'Hagan, Adrian, Murphy, Thomas Brendan
Accidental damage is a typical component of motor insurance claim. Modeling of this nature generally involves analysis of past claim history and different characteristics of the insured objects and the policyholders. Generalized linear models (GLMs) have become the industry’s standard approach for pricing and modeling risks of this nature. However, the GLM approach utilizes a single best model on which loss predictions are based, which ignores the uncertainty among the competing models and variable selection. An additional characteristic of motor insurance datasets is the presence of many categorical variables, within which the number of levels is high. In particular, not all levels of such variables may be statistically significant and rather some subsets of the levels may be merged to give a smaller overall number of levels for improved model parsimony and interpretability. A method is proposed for assessing the optimal manner in which to collapse a factor with many levels into one with a smaller number of levels, then Bayesian model averaging (BMA) is used to blend model predictions from all reasonable models to account for factor collapsing uncertainty. This method will be computationally intensive due to the number of factors being collapsed as well as the possibly large number of levels within factors. Hence a stochastic optimisation is proposed to quickly find the best collapsing cases across the model space.