Now showing 1 - 5 of 5
  • Publication
    Deep learning methods in protein structure prediction
    Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the ’60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
      261Scopus© Citations 99
  • Publication
    Predicting Protein Structural Annotations by Deep and Shallow Learning
    (University College Dublin. School of Computer Science, 2020) ;
    0000-0003-3016-3655
    This thesis discusses the prediction of Protein Structural Annotations by Deep and Shallow Learning and the fundamental position of these Annotations in Structural Bioinformatics, and Bioinformatics in general. Proteins are profoundly characterised by their structure in every aspect of their functioning and, while over the last decades there has been a close to exponential growth in the number of known protein sequences, the growth of known protein structures has been closer to linear because of the high complexity and cost of determining them. Thus, Protein Structure Predictors are among the most thoroughly assessed tools in Bioinformatics (in venues such as CASP or CAMEO) because they allow the structural study of proteins on a large scale. This thesis presents the key types of Protein Structural Annotation and various Shallow and Deep Learning methods and algorithms for predicting them. From one-dimensional Protein Annotations – i.e. Secondary Structure, Solvent Accessibility and Torsion Angles – to more complex and informative two-dimensional protein abstractions – i.e. Contact and Distance maps – both mature and currently developing methods for Protein Structure Annotations are introduced. Particular attention is given to some of the best performing and freely available Deep and Shallow Learning methods to predict Protein Structure Annotations that I contributed to develop. In particular, I carried out a very large study of Neural Network-based methods with the following settings: Shallow Learning has been employed with, or without evolutionary information, then more sophisticated approaches have been employed and refined step by step. This led to a robust state-of-the-art pipeline to predict Protein Structural Annotations by Deep Learning. Finally, I used the extensively studied problem of Secondary Structure Prediction to show how the accuracy of state-of-the-art predictors is strongly correlated to the similarity level between training and test profiles extracted from evolutionary information. Based on this study, I propose a protocol to evaluate the accuracy of a predictor at the profile similarity level instead of the standard sequence level.
      62
  • Publication
    Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction
    Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades. In spite of this, even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy (88-90%), while only a few predict more than the 3 traditional Helix, Strand and Coil classes. In this study we present tests on different models trained both on single sequence and evolutionary profile-based inputs and develop a new state-of-the-art system with Porter 5. Porter 5 is composed of ensembles of cascaded Bidirectional Recurrent Neural Networks and Convolutional Neural Networks, incorporates new input encoding techniques and is trained on a large set of protein structures. Porter 5 achieves 84% accuracy (81% SOV) when tested on 3 classes and 73% accuracy (70% SOV) on 8 classes on a large independent set. In our tests Porter 5 is 2% more accurate than its previous version and outperforms or matches the most recent predictors of secondary structure we tested. When Porter 5 is retrained on SCOPe based sets that eliminate homology between training/testing samples we obtain similar results. Porter is available as a web server and standalone program at http://distilldeep.ucd.ie/porter/ alongside all the datasets and alignments.
      101Scopus© Citations 41
  • Publication
    Protein Structure Annotations
    (Springer, 2019-03-28) ;
    This chapter aims to introduce to the specifics of protein structure annotations and their fundamental position in structural bioinformatics, bioinformatics in general. Proteins are profoundly characterised by their structure in every aspect of their functioning and, while over the last decades there has been a close to exponential growth of known protein sequences, the growth of known protein structures has been closer to linear because of the high complexity and cost of determining them. Thus, protein structure predictors are among the most thoroughly assessed tools in bioinformatics (in venues such as CASP or CAMEO) because they allow the structural study of proteins on a large scale. This chapter presents the key types of protein structure annotation and the methods and algorithms for predicting them, with the aim to give both a historical perspective on their development and a snapshot of their current state of the art. From one-dimensional protein annotations – i.e. secondary structure, solvent accessibility and torsion angles – to more complex and informative two-dimensional protein abstractions, i.e. contact maps, both mature and currently developing methods for protein structure annotations are introduced. The aim of this overview is to facilitate the adoption and development of state-of-the-art protein structural predictors. Particular attention is given to some of the best performing and freely available web servers and standalone programmes to predict protein structure annotations.
      226Scopus© Citations 5
  • Publication
    PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning
    Predicting the three-dimensional structure of proteins is a long-standing challenge of computational biology, as the structure (or lack of a rigid structure) is well known to determine a protein's function. Predicting relative solvent accessibility (RSA) of amino acids within a protein is a significant step towards resolving the protein structure prediction challenge especially in cases in which structural information about a protein is not available by homology transfer. Today, arguably the core of the most powerful prediction methods for predicting RSA and other structural features of proteins is some form of deep learning, and all the state-of-the-art protein structure prediction tools rely on some machine learning algorithm. In this article we present a deep neural network architecture composed of stacks of bidirectional recurrent neural networks and convolutional layers which is capable of mining information from long-range interactions within a protein sequence and apply it to the prediction of protein RSA using a novel encoding method that we shall call "clipped". The final system we present, PaleAle 5.0, which is available as a public server, predicts RSA into two, three and four classes at an accuracy exceeding 80% in two classes, surpassing the performances of all the other predictors we have benchmarked.
      294Scopus© Citations 20