Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Science
  3. School of Mathematics and Statistics
  4. Mathematics and Statistics Research Collection
  5. Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap
 
  • Details
Options

Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap

Author(s)
O'Hagan, Adrian  
Murphy, Thomas Brendan  
Scrucca, Luca  
Gormley, Isobel Claire  
Uri
http://hdl.handle.net/10197/10942
Date Issued
2019-05-28
Date Available
2019-07-22T13:12:50Z
Abstract
Mixture models with (multivariate) Gaussian components are a popular tool in model-based clustering. Such models are often fitted by a procedure that maximizes the likelihood, such as the EM algorithm. At convergence, the maximum likelihood parameter estimates are typically reported, but in most cases little emphasis is placed on the variability associated with these estimates. In part this may be due to the fact that standard errors are not directly calculated in the model-fitting algorithm, either because they are not required to fit the model, or because they are difficult to compute. The examination of standard errors in model-based clustering is therefore typically neglected. Sampling based methods, such as the jackknife (JK), bootstrap (BS) and parametric bootstrap (PB), are intuitive, generalizable approaches to assessing parameter uncertainty in model-based clustering using a Gaussian mixture model. This paper provides a review and empirical comparison of the jackknife, bootstrap and parametric bootstrap methods for producing standard errors and confidence intervals for mixture parameters. The performance of such sampling methods in the presence of small and/or overlapping clusters requires consideration however; here the weighted likelihood bootstrap (WLBS) approach is demonstrated to be effective in addressing this concern in a model-based clustering framework. The JK, BS, PB and WLBS methods are illustrated and contrasted through simulation studies and through the traditional Old Faithful data set and also the Thyroid data set. The MclustBootstrap function, available in the most recent release of the popular R package mclust, facilitates the implementation of the JK, BS, PB and WLBS approaches to estimating parameter uncertainty in the context of model-based clustering. The JK, WLBS and PB approaches to variance estimation are shown to be robust and provide good coverage across a range of real and simulated data sets when performing model-based clustering; but care is advised when using the BS in such settings. In the case of poor model fit (for example for data with small and/or overlapping clusters), JK and BS are found to suffer from not being able to fit the specified model in many of the sub-samples formed. The PB also suffers when model fit is poor since it is reliant on data sets simulated from the model upon which to base the variance estimation calculations. However the WLBS will generally provide a robust solution, driven by the fact that all observations are represented with some weight in each of the sub-samples formed under this approach.
Sponsorship
Science Foundation Ireland
Other Sponsorship
Insight Research Centre
Type of Material
Journal Article
Publisher
Springer Science and Business Media LLC
Journal
Computational Statistics
Volume
34
Start Page
1779
End Page
1813
Copyright (Published Version)
2019 Springer
Subjects

Mclust

MclustBootstrap

Precision

Standard errors

Variance estimation

DOI
10.1007/s00180-019-00897-9
SFI/12/RC/2289
Language
English
Status of Item
Peer reviewed
ISSN
0943-4062
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
File(s)
No Thumbnail Available
Name

JKBS_paper.pdf

Size

3.47 MB

Format

Adobe PDF

Checksum (MD5)

451b1bf3fc0eed0e6ff8f9e5be1651e9

Owning collection
Mathematics and Statistics Research Collection
Mapped collections
Insight Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement