Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Health and Agricultural Sciences
  3. School of Medicine
  4. Medicine Research Collection
  5. Sequence embedding for fast construction of guide trees for multiple sequence alignment
 
  • Details
Options

Sequence embedding for fast construction of guide trees for multiple sequence alignment

Author(s)
Blackshields, Gordon  
Sievers, Fabian  
Shi, Weifeng  
Wilm, Andreas  
Higgins, Desmond G  
Uri
http://hdl.handle.net/10197/7307
Date Issued
2010-05-14
Date Available
2015-12-16T10:00:29Z
Abstract
The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http://www.clustal.org/mbed.tgz.
Sponsorship
Science Foundation Ireland
Type of Material
Journal Article
Publisher
BioMed Central
Journal
Algorithms for Molecular Biology
Volume
5
Issue
21
Start Page
1
End Page
11
Copyright (Published Version)
2010 the Authors
Subjects

DNA sequencing

Input sequences

Embedding

mBed

DOI
10.1186/1748-7188-5-21
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
File(s)
Loading...
Thumbnail Image
Name

1748-7188-5-21.pdf

Size

1.01 MB

Format

Adobe PDF

Checksum (MD5)

861ee894a5fb9868bd62bcdf1b729f30

Owning collection
Medicine Research Collection
Mapped collections
Conway Institute Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

For all queries please contact research.repository@ucd.ie.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement