Making automated multiple alignments of very large numbers of protein sequences

Files in This Item:
File Description SizeFormat 
Bioinformatics-2013-Sievers-989-95.pdf710.68 kBAdobe PDFDownload
Title: Making automated multiple alignments of very large numbers of protein sequences
Authors: Sievers, Fabian
Dineen, David
Wilm, Andreas
Higgins, D. (Des)
Permanent link:
Date: 21-Feb-2013
Abstract: Motivation: Recent developments in sequence alignment software have made possible multiple sequence alignments (MSAs) of >100 000 sequences in reasonable times. At present, there are no systematic analyses concerning the scalability of the alignment quality as the number of aligned sequences is increased. Results: We benchmarked a wide range of widely used MSA packages using a selection of protein families with some known structures and found that the accuracy of such alignments decreases markedly as the number of sequences grows. This is more or less true of all packages and protein families. The phenomenon is mostly due to the accumulation of alignment errors, rather than problems in guide-tree construction. This is partly alleviated by using iterative refinement or selectively adding sequences. The average accuracy of progressive methods by comparison with structure-based benchmarks can be improved by incorporating information derived from high-quality structural alignments of sequences with solved structures. This suggests that the availability of high quality curated alignments will have to complement algorithmic and/or software developments in the long-term.
Funding Details: Science Foundation Ireland
Type of material: Journal Article
Publisher: Oxford University Press
Copyright (published version): 2013 the Author
Keywords: DNA sequencingSequence analysis
DOI: 10.1093/bioinformatics/btt093
Language: en
Status of Item: Peer reviewed
Appears in Collections:Conway Institute Research Collection
Medicine Research Collection

Show full item record

Citations 10

Last Week
Last month
checked on Aug 17, 2018

Google ScholarTM



This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.