Now showing 1 - 4 of 4
  • Publication
    Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications
    (Institute of Mathematical Statistics, 2010-03) ; ;
    Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins.
      381Scopus© Citations 33
  • Publication
    mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models
    (R Foundation for Statistical Computing, 2016-08-01) ; ; ;
    Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of purposes of analysis. Recently, version 5 of the package has been made available on CRAN. This updated version adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.
  • Publication
    Adaptive Incremental Mixture Markov chain Monte Carlo
    We propose Adaptive Incremental Mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. While adaptive MCMC methods usually update a parametric proposal kernel with a global rule, AIMM locally adapts a semiparametric kernel. AIMM is based on an independent Metropolis-Hastings proposal distribution which takes the form of a finite mixture of Gaussian distributions. Central to this approach is the idea that the proposal distribution adapts to the target by locally adding a mixture component when the discrepancy between the proposal mixture and the target is deemed to be too large. As a result, the number of components in the mixture proposal is not fixed in advance. Theoretically, we prove that there exists a process that can be made arbitrarily close to AIMM and that converges to the correct target distribution. We also illustrate that it performs well in practice in a variety of challenging situations, including high-dimensional and multimodal target distributions.
      214Scopus© Citations 3
  • Publication
    Properties of Latent Variable Network Models
    (Cambridge University Press, 2016-12-12) ; ;
    We derive properties of Latent Variable Models for networks, a broad class ofmodels that includes the widely-used Latent Position Models. These include theaverage degree distribution, clustering coefficient, average path length and degreecorrelations. We introduce the Gaussian Latent Position Model, and derive analyticexpressions and asymptotic approximations for its network properties. Wepay particular attention to one special case, the Gaussian Latent Position Modelswith Random Effects, and show that it can represent the heavy-tailed degree distributions,positive asymptotic clustering coefficients and small-world behaviours thatare often observed in social networks. Several real and simulated examples illustratethe ability of the models to capture important features of observed networks.
      288Scopus© Citations 21