Clustering high‐dimensional mixed data to uncover sub‐phenotypes: joint analysis of phenotypic and genotypic data

Files in This Item:
 File SizeFormat
DownloadClustering high dimentional mixed datas to uncover sub phenotypes.pdf385.2 kBAdobe PDF
Title: Clustering high‐dimensional mixed data to uncover sub‐phenotypes: joint analysis of phenotypic and genotypic data
Authors: McParland, DamienPhillips, CatherineBrennan, LorraineRoche, Helen M.Gormley, Isobel Claire
Permanent link: http://hdl.handle.net/10197/10873
Date: 30-Jun-2017
Online since: 2019-07-10T10:44:55Z
Abstract: The LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes ('healthy' and 'at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7 years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition.
Funding Details: Science Foundation Ireland
Funding Details: Insight Research Centre
European Commission FP6
Type of material: Journal Article
Publisher: Wiley Online Library
Journal: Statistics in Medicine
Volume: 36
Issue: 28
Start page: 4548
End page: 4569
Copyright (published version): 2017 Wiley
Keywords: ClusteringMixed dataPhenotypic dataSNP dataMetabolic syndrome
DOI: 10.1002/sim.7371
Language: en
Status of Item: Peer reviewed
This item is made available under a Creative Commons License: https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
Appears in Collections:Conway Institute Research Collection
Mathematics and Statistics Research Collection
Institute of Food and Health Research Collection
Public Health, Physiotherapy and Sports Science Research Collection
Insight Research Collection
Agriculture and Food Science Research Collection

Show full item record

SCOPUSTM   
Citations 50

6
Last Week
0
Last month
checked on Sep 12, 2020

Page view(s)

769
Last Week
7
Last month
19
checked on May 26, 2022

Download(s) 50

322
checked on May 26, 2022

Google ScholarTM

Check

Altmetric


If you are a publisher or author and have copyright concerns for any item, please email research.repository@ucd.ie and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.