Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Health and Agricultural Sciences
  3. School of Public Health, Physiotherapy and Sports Science
  4. Public Health, Physiotherapy and Sports Science Research Collection
  5. Genetic classification of populations using supervised learning
 
  • Details
Options

Genetic classification of populations using supervised learning

Author(s)
Bridges, Michael  
Heron, Elizabeth A.  
O'Dushlaine, Colm  
Segurado, Ricardo  
Morris, Derek  
Corvin, Aiden  
Gill, Michael  
Pinto, Carlos  
Uri
http://hdl.handle.net/10197/4378
Date Issued
2011-05-12
Date Available
2013-06-20T13:46:51Z
Abstract
There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.
Type of Material
Journal Article
Publisher
Public Library of Science
Journal
PLoS ONE
Volume
6
Issue
5
Copyright (Published Version)
2011 Bridges et al.
Subjects

Population genetics

Genetic differences

Machine learning

DOI
10.1371/journal.pone.0014802
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
File(s)
Loading...
Thumbnail Image
Name

Bridges_2011.pdf

Size

680.35 KB

Format

Adobe PDF

Checksum (MD5)

23f00380281034227828de12363bd41f

Owning collection
Public Health, Physiotherapy and Sports Science Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

For all queries please contact research.repository@ucd.ie.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement