Sequence embedding for fast construction of guide trees for multiple sequence alignment

Files in This Item:
 File SizeFormat
Download1748-7188-5-21.pdf1.04 MBAdobe PDF
Title: Sequence embedding for fast construction of guide trees for multiple sequence alignment
Authors: Blackshields, GordonSievers, FabianShi, WeifengWilm, AndreasHiggins, Desmond G
Permanent link: http://hdl.handle.net/10197/7307
Date: 14-May-2010
Online since: 2015-12-16T10:00:29Z
Abstract: The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http://www.clustal.org/mbed.tgz.
Funding Details: Science Foundation Ireland
Type of material: Journal Article
Publisher: BioMed Central
Journal: Algorithms for Molecular Biology
Volume: 5
Issue: 21
Start page: 1
End page: 11
Copyright (published version): 2010 the Authors
Keywords: DNA sequencingInput sequencesEmbeddingmBed
DOI: 10.1186/1748-7188-5-21
Language: en
Status of Item: Peer reviewed
This item is made available under a Creative Commons License: https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
Appears in Collections:Conway Institute Research Collection
Medicine Research Collection

Show full item record

SCOPUSTM   
Citations 5

63
Last Week
1
Last month
checked on Sep 11, 2020

Page view(s) 50

1,559
Last Week
3
Last month
18
checked on Nov 28, 2022

Download(s)

243
checked on Nov 28, 2022

Google ScholarTM

Check

Altmetric


If you are a publisher or author and have copyright concerns for any item, please email research.repository@ucd.ie and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.