The impact of guide trees in large-scale protein multiple sequence alignments
|Title:||The impact of guide trees in large-scale protein multiple sequence alignments||Authors:||Boyce, Kieran||Advisor:||Higgins, Desmond G||Permanent link:||http://hdl.handle.net/10197/8698||Date:||2016||Abstract:||The focus of this thesis is on large-scale progressive protein multiple sequence alignment algorithms. Although first developed over 30 years ago, multiple sequence alignment algorithms are still an active area of research given their widespread use in many biological analyses, and the dramatic increase in sequence information over the years. The behaviour of the existing algorithms with large numbers of sequences is examined in this work, and in particular the impact of guide trees on the alignments generated.This thesis is divided into 5 chapters. Chapter 1 introduces the concept of a multiple sequence alignment, its uses and how it is constructed. It also details the specifics of progressive alignments, describes how guide trees are constructed, and provides an overview of a number of the ways in which the quality of an alignment can be measured.Chapter 2 examines the impact the topology of the guide tree has on the generated alignment. It finds that simply aligning sequences one after another can produce higher quality alignments than the default alignment methods when measured using structure-based benchmarks. This increase in quality is particularly noticeable with larger alignments. It also finds that randomly ordering the sequences produces aligments with similar quality as any of the other orderings examined.Chapter 3 finds that, because of a tradeoff between alignment accuracy and computation time, larger alignments generated by some of the the most common multiple sequence alignment programs are inherently unstable, and changing the order in which the sequences are listed in the input file will cause a different alignment to be created.Chapter 4 proposes an ordering of the sequences to be aligned that will produce a better quality alignment than the random ordering identified in Chapter 2. It also attempts to resolve the instability issue identified in the previous chapter.Finally, Chapter 5 reviews the findings presented in the thesis, and proposes possible future steps to both use and continue to develop these findings.||Type of material:||Doctoral Thesis||Publisher:||University College Dublin. School of Medicine||Qualification Name:||Ph.D.||Copyright (published version):||2016 the author||Keywords:||Algorithms; Bioinformatics; Computational Biology; Guide Trees; Multiple Sequence Alignements; Proteins||Other versions:||http://dissertations.umi.com/ucd:10128||Language:||en||Status of Item:||Peer reviewed|
|Appears in Collections:||Medicine Theses|
Show full item record
This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.