Now showing 1 - 10 of 26
  • Publication
    Mixtures of Experts Models
    Mixtures of experts models provide a framework in which covariates may be included in mixture models. This is achieved by modelling the parameters of the mixture model as functions of the concomitant covariates. Given their mixture model foundation, mixtures of experts models possess a diverse range of analytic uses, from clustering observations to capturing parameter heterogeneity in cross-sectional data. This chapter focuses on delineating the mixture of experts modelling framework and demonstrates the utility and flexibility of mixtures of experts models as an analytic tool.
      614
  • Publication
    Combining biomarker and food intake data
    Recent developments in biomarker discovery have demonstrated that combining biomarkers with self-reported intake data has the potential to improve estimation of food intake. Here, statistical methods for combining biomarker and self-reported food intake data are discussed. The calibration equations method is a widely applied method that corrects for measurement error in self-reported food intake data through the use of biomarker data. The method is outlined and illustrated through an example where citrus intake is estimated. In order to estimate stable calibration equations, a simulation-based framework is delineated which estimates the percentage of study subjects from whom biomarker data is required. The method of triads is frequently used to assess the validity of self-reported food intake data by combining it with biomarker data. The method is outlined and sensitivity to its underlying assumptions is illustrated through simulation studies.
      167
  • Publication
    Clustering ranked preference data using sociodemographic covariates
    Ranked preference data arise when a set of judges rank, in order of their preference, a set of objects. Such data arise in preferential voting systems and market research surveys. Covariate data associated with the judges are also often recorded. Such covariate data should be used in conjunction with preference data when drawing inferences about judges. To cluster a population of judges, the population is modelled as a collection of homogeneous groups. The Plackett-Luce model for ranked data is employed to model a judge’s ranked preferences within a group. A mixture of Plackett-Luce models is employed to model the population of judges, where each component in the mixture represents a group of judges. Mixture of experts models provide a framework in which covariates are included in mixture models. Covariates are included through the mixing proportions and the component density parameters. A mixture of experts model for ranked preference data is developed by combining a mixture of experts model and a mixture of Plackett-Luce models. Particular attention is given to the manner in which covariates enter the model. The mixing proportions and group specific parameters are potentially dependent on covariates. Model selection procedures are employed to choose optimal models. Model parameters are estimated via the ‘EMM algorithm’, a hybrid of the Expectation-Maximization and the Minorization-Maximization algorithms. Examples are provided through a menu survey and through Irish election data. Results indicate mixture modelling using covariates is insightful when examining a population of judges who express preferences.
      639
  • Publication
    Clustering Ordinal Data via Latent Variable Models
    Item response modelling is a well established method for analysing ordinal response data. Ordinal data are typically collected as responses to a number of questions or items. The observed data can be viewed as discrete versions of an underlying latent Gaussian variable. Item response models assume that this latent variable (and therefore the observed ordinal response) is a function of both respondent specific and item specific parameters. However, item response models assume a homogeneous population in that the item specific parameters are assumed to be the same for all respondents. Often a population is heterogeneous and clusters of respondents exist; members of different clusters may view the items differently. A mixture of item response models is developed to provide clustering capabilities in the context of ordinal response data. The model is estimated within the Bayesian paradigm and is illustrated through an application to an ordinal response data set resulting from a clinical trial involving self-assessment of arthritis.
    Scopus© Citations 10  750
  • Publication
    A dynamic probabilistic principal components model for the analysis of longitudinal metabolomics data
    In a longitudinal metabolomics study, multiple metabolites are measured from several observations at many time points. Interest lies in reducing the dimensionality of such data and in highlighting influential metabolites which change over time. A dynamic probabilistic principal components analysis model is proposed to achieve dimension reduction while appropriately modelling the correlation due to repeated measurements. This is achieved by assuming an auto-regressive model for some of the model parameters. Linear mixed models are subsequently used to identify influential metabolites which change over time. The model proposed is used to analyse data from a longitudinal metabolomics animal study.
    Scopus© Citations 17  758
  • Publication
    Model Based Clustering for Mixed Data: clustMD
    A model based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. clustMD employs a parsimonious covariance structure for the latent variables, leading to a suite of six clustering models that vary in complexity and provide an elegant and unified approach to clustering mixed data. An expectation maximisation (EM) algorithm is used to estimate clustMD; in the presence of nominal data a Monte Carlo EM algorithm is required. The clustMD model is illustrated by clustering simulated mixed type data and prostate cancer patients, on whom mixed data have been recorded.
    Scopus© Citations 45  603
  • Publication
    Inferring food intake from multiple biomarkers using a latent variable model
    (Institute of Mathematical Statistics, 2021-12) ; ;
    Metabolomic based approaches have gained much attention in recent years due to their promising potential to deliver objective tools for assessment of food intake. In particular, multiple biomarkers have emerged for single foods. However, there is a lack of statistical tools available for combining multiple biomarkers to quantitatively infer food intake. Furthermore, there is a paucity of approaches for estimating the uncertainty around biomarker-based inferred intake. Here, to estimate the relationship between multiple metabolomic biomarkers and food intake in an intervention study conducted under the A-DIET research programme, a latent variable model, multiMarker, is proposed. The multiMarker model integrates factor analytic and mixture of experts models: the observed biomarker values are related to intake which is described as a continuous latent variable which follows a flexible mixture of experts model with Gaussian components. The multiMarker model also facilitates inference on the latent intake when only biomarker data are subsequently observed. A Bayesian hierarchical modelling framework provides flexibility to adapt to different biomarker distributions and facilitates inference of the latent intake along with its associated uncertainty. Simulation studies are conducted to assess the performance of the multiMarker model, prior to its application to the motivating application of quantifying apple intake.
    Scopus© Citations 2  71
  • Publication
    A mixture of experts model for rank data with applications in election studies
    (Institute of Mathematical Statistics, 2008-12) ;
    A voting bloc is defined to be a group of voters who have similar voting preferences. The cleavage of the Irish electorate into voting blocs is of interest. Irish elections employ a 'single transferable vote' electoral system; under this system voters rank some or all of the electoral candidates in order of preference. These rank votes provide a rich source of preference information from which inferences about the composition of the electorate may be drawn. Additionally, the influence of social factors or covariates on the electorate composition is of interest. A mixture of experts model is a mixture model in which the model parameters are functions of covariates. A mixture of experts model for rank data is developed to provide a model-based method to cluster Irish voters into voting blocs, to examine the influence of social factors on this clustering and to examine the characteristic preferences of the voting blocs. The Benter model for rank data is employed as the family of component densities within the mixture of experts model; generalized linear model theory is employed to model the influence of covariates on the mixing proportions. Model fitting is achieved via a hybrid of the EM and MM algorithms. An example of the methodology is illustrated by examining an Irish presidential election. The existence of voting blocs in the electorate is established and it is determined that age and government satisfaction levels are important factors in influencing voting in this election.
      332Scopus© Citations 81
  • Publication
    Computational Aspects of Fitting Mixture Models via the Expectation-Maximization Algorithm
    The Expectation–Maximization (EM) algorithm is a popular tool in a wide variety of statistical settings, in particular in the maximum likelihood estimation of parameters when clustering using mixture models. A serious pitfall is that in the case of a multimodal likelihood function the algorithm may become trapped at a local maximum, resulting in an inferior clustering solution. In addition, convergence to an optimal solution can be very slow. Methods are proposed to address these issues: optimizing starting values for the algorithm and targeting maximization steps efficiently. It is demonstrated that these approaches can produce superior outcomes to initialization via random starts or hierarchical clustering and that the rate of convergence to an optimal solution can be greatly improved.
    Scopus© Citations 34  634
  • Publication
    Clustering with the multivariate normal inverse Gaussian distribution
    Many model-based clustering methods are based on a finite Gaussian mixture model. The Gaussian mixture model implies that the data scatter within each group is elliptically shaped. Hence non-elliptical groups are often modeled by more than one component, resulting in model over-fitting. An alternative is to use a mean–variance mixture of multivariate normal distributions with an inverse Gaussian mixing distribution (MNIG) in place of the Gaussian distribution, to yield a more flexible family of distributions. Under this model the component distributions may be skewed and have fatter tails than the Gaussian distribution. The MNIG based approach is extended to include a broad range of eigendecomposed covariance structures. Furthermore, MNIG models where the other distributional parameters are constrained is considered. The Bayesian Information Criterion is used to identify the optimal model and number of mixture components. The method is demonstrated on three sample data sets and a novel variation on the univariate Kolmogorov–Smirnov test is used to assess goodness of fit.
      17505Scopus© Citations 60