Beyond the twilight zone : automated prediction of structural properties of proteins by recursive neural networks and remote homology information

Files in This Item:
File Description SizeFormat 
MooneyPollastri2009.pdf170.86 kBAdobe PDFDownload
Title: Beyond the twilight zone : automated prediction of structural properties of proteins by recursive neural networks and remote homology information
Authors: Mooney, Catherine
Pollastri, Gianluca
Permanent link: http://hdl.handle.net/10197/3442
Date: Oct-2009
Abstract: The prediction of 1D structural properties of proteins is an important step toward the prediction of protein structure and function, not only in the ab initio case but also when homology information to known structures is available. Despite this the vast majority of 1D predictors do not incorporate homology information into the prediction process. We develop a novel structural alignment method, SAMD, which we use to build alignments of putative remote homologues that we compress into templates of structural frequency profiles. We use these templates as additional input to ensembles of recursive neural networks, which we specialise for the prediction of query sequences that show only remote homology to any Protein Data Bank structure. We predict four 1D structural properties – secondary structure, relative solvent accessibility, backbone structural motifs, and contact density. Secondary structure prediction accuracy, tested by five-fold cross-validation on a large set of proteins allowing less than 25% sequence identity between training and test set and query sequences and templates, exceeds 82%, outperforming its ab initio counterpart, other state-of-the-art secondary structure predictors (Jpred 3 and PSIPRED) and two other systems based on PSI-BLAST and COMPASS templates. We show that structural information from homologues improves prediction accuracy well beyond the Twilight Zone of sequence similarity, even below 5% sequence identity, for all four structural properties. Significant improvement over the extraction of structural information directly from PDB templates suggests that the combination of sequence and template information is more informative than templates alone.
Funding Details: Science Foundation Ireland
Health Research Board
Type of material: Journal Article
Publisher: Wiley InterScience
Copyright (published version): 2009 Wiley-Liss, Inc.
Keywords: Alignments;Homology detection;Secondary structure;Solvent accessibility;Machine learning
Subject LCSH: Sequence alignment (Bioinformatics)
Homology (Biology)
Machine learning
DOI: 10.1002/prot.22429
Language: en
Status of Item: Peer reviewed
Appears in Collections:Computer Science Research Collection
CASL Research Collection

Show full item record

SCOPUSTM   
Citations 5

46
Last Week
1
Last month
checked on Jun 22, 2018

Page view(s) 50

115
checked on May 25, 2018

Download(s) 20

262
checked on May 25, 2018

Google ScholarTM

Check

Altmetric


This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.