Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Institutes and Centres
  3. Insight Centre for Data Analytics
  4. Insight Research Collection
  5. Clustering high‐dimensional mixed data to uncover sub‐phenotypes: joint analysis of phenotypic and genotypic data
 
  • Details
Options

Clustering high‐dimensional mixed data to uncover sub‐phenotypes: joint analysis of phenotypic and genotypic data

Author(s)
McParland, Damien  
Phillips, Catherine  
Brennan, Lorraine  
Roche, Helen M.  
Gormley, Isobel Claire  
Uri
http://hdl.handle.net/10197/10873
Date Issued
2017-06-30
Date Available
2019-07-10T10:44:55Z
Abstract
The LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes ('healthy' and 'at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7 years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition.
Sponsorship
Science Foundation Ireland
Other Sponsorship
Insight Research Centre
European Commission FP6
Type of Material
Journal Article
Publisher
Wiley Online Library
Journal
Statistics in Medicine
Volume
36
Issue
28
Start Page
4548
End Page
4569
Copyright (Published Version)
2017 Wiley
Subjects

Clustering

Mixed data

Phenotypic data

SNP data

Metabolic syndrome

DOI
10.1002/sim.7371
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
File(s)
Loading...
Thumbnail Image
Name

Clustering high dimentional mixed datas to uncover sub phenotypes.pdf

Size

385.2 KB

Format

Adobe PDF

Checksum (MD5)

6c4615779694ce3df0127d30aeec9155

Owning collection
Insight Research Collection
Mapped collections
Agriculture and Food Science Research Collection•
Conway Institute Research Collection•
Institute of Food and Health Research Collection•
Mathematics and Statistics Research Collection•
Public Health, Physiotherapy and Sports Science Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement