Model-based clustering with sparse covariance matrices

Files in This Item:
File Description SizeFormat 
insight_publication.pdf1.05 MBAdobe PDFDownload
Title: Model-based clustering with sparse covariance matrices
Authors: Fop, MichaelMurphy, Thomas BrendanScrucca, Luca
Permanent link: http://hdl.handle.net/10197/11364
Date: 2019
Online since: 2020-05-05T14:00:43Z
Abstract: Finite Gaussian mixture models are widely used for model-based clustering of continuous data. Nevertheless, since the number of model parameters scales quadratically with the number of variables, these models can be easily over-parameterized. For this reason, parsimonious models have been developed via covariance matrix decompositions or assuming local independence. However, these remedies do not allow for direct estimation of sparse covariance matrices nor do they take into account that the structure of association among the variables can vary from one cluster to the other. To this end, we introduce mixtures of Gaussian covariance graph models for model-based clustering with sparse covariance matrices. A penalized likelihood approach is employed for estimation and a general penalty term on the graph configurations can be used to induce different levels of sparsity and incorporate prior knowledge. Model estimation is carried out using a structural-EM algorithm for parameters and graph structure estimation, where two alternative strategies based on a genetic algorithm and an efficient stepwise search are proposed for inference. With this approach, sparse component covariance matrices are directly obtained. The framework results in a parsimonious model-based clustering of the data via a flexible model for the within-group joint distribution of the variables. Extensive simulated data experiments and application to illustrative datasets show that the method attains good classification performance and model quality. The general methodology for model-based clustering with sparse covariance matrices is implemented in the R package mixggm, available on CRAN.
Funding Details: Science Foundation Ireland
metadata.dc.description.othersponsorship: Insight Research Centre
Type of material: Journal Article
Publisher: Springer
Journal: Statistics and Computing
Volume: 29
Issue: 4
Start page: 791
End page: 819
Copyright (published version): 2018 Springer
Keywords: Finite Gaussian mixture modelsGaussian graphical modelsGenetic algorithmModel-based clusteringPenalized likelihoodSparse covariance matricesStepwise searchStructural-EM algorithm
DOI: 10.1007/s11222-018-9838-y
Language: en
Status of Item: Peer reviewed
Appears in Collections:Mathematics and Statistics Research Collection
Insight Research Collection

Show full item record

SCOPUSTM   
Citations 50

1
Last Week
0
Last month
checked on Jun 4, 2020

Page view(s)

85
Last Week
16
Last month
checked on Jun 6, 2020

Download(s)

17
checked on Jun 6, 2020

Google ScholarTM

Check

Altmetric


This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.