The impact of guide trees in large-scale protein multiple sequence alignments

Files in This Item:
 File SizeFormat
DownloadBoyce_ucd_5090D_10128.pdf3.32 MBAdobe PDF
Title: The impact of guide trees in large-scale protein multiple sequence alignments
Authors: Boyce, Kieran
Advisor: Higgins, Desmond G
Permanent link:
Date: 2016
Online since: 2017-08-27T01:00:26Z
Abstract: The focus of this thesis is on large-scale progressive protein multiple sequence alignment algorithms. Although first developed over 30 years ago, multiple sequence alignment algorithms are still an active area of research given their widespread use in many biological analyses, and the dramatic increase in sequence information over the years. The behaviour of the existing algorithms with large numbers of sequences is examined in this work, and in particular the impact of guide trees on the alignments generated.This thesis is divided into 5 chapters. Chapter 1 introduces the concept of a multiple sequence alignment, its uses and how it is constructed. It also details the specifics of progressive alignments, describes how guide trees are constructed, and provides an overview of a number of the ways in which the quality of an alignment can be measured.Chapter 2 examines the impact the topology of the guide tree has on the generated alignment. It finds that simply aligning sequences one after another can produce higher quality alignments than the default alignment methods when measured using structure-based benchmarks. This increase in quality is particularly noticeable with larger alignments. It also finds that randomly ordering the sequences produces aligments with similar quality as any of the other orderings examined.Chapter 3 finds that, because of a tradeoff between alignment accuracy and computation time, larger alignments generated by some of the the most common multiple sequence alignment programs are inherently unstable, and changing the order in which the sequences are listed in the input file will cause a different alignment to be created.Chapter 4 proposes an ordering of the sequences to be aligned that will produce a better quality alignment than the random ordering identified in Chapter 2. It also attempts to resolve the instability issue identified in the previous chapter.Finally, Chapter 5 reviews the findings presented in the thesis, and proposes possible future steps to both use and continue to develop these findings.
Type of material: Doctoral Thesis
Publisher: University College Dublin. School of Medicine  
Qualification Name: Ph.D.
Copyright (published version): 2016 the author
Keywords: AlgorithmsBioinformaticsComputational BiologyGuide TreesMultiple Sequence AlignementsProteins
Other versions:
Language: en
Status of Item: Peer reviewed
This item is made available under a Creative Commons License:
Appears in Collections:Medicine Theses

Show full item record

Page view(s)

Last Week
Last month
checked on May 26, 2022


checked on May 26, 2022

Google ScholarTM


If you are a publisher or author and have copyright concerns for any item, please email and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.