Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Science
  3. School of Computer Science
  4. Computer Science Research Collection
  5. Ab initio and homology based prediction of protein domains by recursive neural networks
 
  • Details
Options

Ab initio and homology based prediction of protein domains by recursive neural networks

Author(s)
Walsh, Ian  
Martin, Alberto J. M.  
Mooney, Catherine  
Rubagotti, Enrico  
Vullo, Alessandro  
Pollastri, Gianluca  
Uri
http://hdl.handle.net/10197/3396
Date Issued
2009-06-26
Date Available
2011-12-12T12:00:30Z
Abstract
Background: Proteins, especially larger ones, are often composed of individual evolutionary units, domains, which have their own function and structural fold. Predicting domains is an important intermediate step in protein analyses, including the prediction of protein structures.
Results: We describe novel systems for the prediction of protein domain boundaries powered by Recursive Neural Networks. The systems rely on a combination of primary sequence and evolutionary information, predictions of structural features such as secondary structure, solvent accessibility and residue contact maps, and structural templates, both annotated for domains (from the SCOP dataset) and unannotated (from the PDB). We gauge the contribution of contact maps, and PDB and SCOP templates independently and for different ranges of template quality. We find that accurately predicted contact maps are informative for the prediction of domain boundaries, while the same is not true for contact maps predicted ab initio. We also find that gap information from PDB templates is informative, but, not surprisingly, less than SCOP annotations. We test both systems trained on templates of all qualities, and systems trained only on templates of marginal similarity to the query (less than 25% sequence identity). While the first batch of systems produces near perfect predictions in the presence of fair to good templates, the second batch outperforms or match ab initio predictors down to essentially any level of template quality.

We test all systems in 5-fold cross-validation on a large non-redundant set of multi-domain and single domain proteins. The final predictors are state-of-the-art, with a template-less prediction boundary recall of 50.8% (precision 38.7%) within ± 20 residues and a single domain recall of 80.3% (precision 78.1%). The SCOP-based predictors achieve a boundary recall of 74% (precision 77.1%) again within ± 20 residues, and classify single domain proteins as such in over 85% of cases, when we allow a mix of bad and good quality templates. If we only allow marginal templates (max 25% sequence identity to the query) the scores remain high, with boundary recall and precision of 59% and 66.3%, and 80% of all single domain proteins predicted correctly.
Conclusion: The systems presented here may prove useful in large-scale annotation of protein domains in proteins of unknown structure. The methods are available as public web servers at the address: http://distill.ucd.ie/shandy/ and we plan on running them on a multi-genomic scale and make the results public in the near future.
Sponsorship
Science Foundation Ireland
Health Research Board
Other Sponsorship
UCD President's Award 2004
Type of Material
Journal Article
Publisher
BioMed Central
Journal
BMC Bioinformatics
Volume
10
Issue
195
Copyright (Published Version)
2009 Walsh et al; licensee BioMed Central Ltd.
Subjects

RNN

Neural networks

Protein domain predic...

Subject – LCSH
Neural networks (Computer science)
Proteins--Structure
DOI
10.1186/1471-2105-10-195
Web versions
http://www.biomedcentral.com/1471-2105/10/195
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-sa/1.0/
File(s)
Loading...
Thumbnail Image
Name

Walsh_domains_2009.pdf

Size

570.82 KB

Format

Adobe PDF

Checksum (MD5)

ba43474bf1bbecf18e869818b7e92fcb

Owning collection
Computer Science Research Collection
Mapped collections
CASL Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

For all queries please contact research.repository@ucd.ie.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement