Now showing 1 - 2 of 2
- PublicationThe impact of guide trees in large-scale protein multiple sequence alignmentsThe focus of this thesis is on large-scale progressive protein multiple sequence alignment algorithms. Although first developed over 30 years ago, multiple sequence alignment algorithms are still an active area of research given their widespread use in many biological analyses, and the dramatic increase in sequence information over the years. The behaviour of the existing algorithms with large numbers of sequences is examined in this work, and in particular the impact of guide trees on the alignments generated.This thesis is divided into 5 chapters. Chapter 1 introduces the concept of a multiple sequence alignment, its uses and how it is constructed. It also details the specifics of progressive alignments, describes how guide trees are constructed, and provides an overview of a number of the ways in which the quality of an alignment can be measured.Chapter 2 examines the impact the topology of the guide tree has on the generated alignment. It finds that simply aligning sequences one after another can produce higher quality alignments than the default alignment methods when measured using structure-based benchmarks. This increase in quality is particularly noticeable with larger alignments. It also finds that randomly ordering the sequences produces aligments with similar quality as any of the other orderings examined.Chapter 3 finds that, because of a tradeoff between alignment accuracy and computation time, larger alignments generated by some of the the most common multiple sequence alignment programs are inherently unstable, and changing the order in which the sequences are listed in the input file will cause a different alignment to be created.Chapter 4 proposes an ordering of the sequences to be aligned that will produce a better quality alignment than the random ordering identified in Chapter 2. It also attempts to resolve the instability issue identified in the previous chapter.Finally, Chapter 5 reviews the findings presented in the thesis, and proposes possible future steps to both use and continue to develop these findings.
- PublicationInstability in progressive multiple sequence alignment algorithmsBackground: Progressive alignment is the standard approach used to align large numbers of sequences. As with all heuristics, this involves a trade off between alignment accuracy and computation time. Results: We examine this trade off and find that, because of a loss of information in the early steps of the approach, the alignments generated by the most common multiple sequence alignment programs are inherently unstable, and simply reversing the order of the sequences in the input file will cause a different alignment to be generated. Although this effect is more obvious with larger numbers of sequences, it can also be seen with data sets in the order of one hundred sequences. We also outline the means to determine the number of sequences in a data set beyond which the probability of instability will become more pronounced. Conclusions: This has major ramifications for both the designers of large-scale multiple sequence alignment algorithms, and for the users of these alignments.
216Scopus© Citations 18