Now showing 1 - 10 of 24
  • Publication
    Potential utility of docking to identify protein-peptide binding regions
    (University College Dublin. School of Computer Science and Informatics, 2013-05) ; ; ; ;
    Disordered regions of proteins often bind to structured domains, mediating interactions within and between proteins. However, it is difficult to identify a priori the short regions involved in binding. We set out to determine if docking peptides to peptide binding domains would assist in these predictions. First, we investigated the docking of known short peptides to their native and non-native peptide binding domains. We then investigated the docking of overlapping peptides adjacent to the native peptide. We found only weak discrimination of docking scores between native peptide and adjacent peptides in this context with similar results for both ordered and disordered regions. Finally, we trained a bidirectional recurrent neural network using as input the peptide sequence, predicted secondary structure, Vina docking score and Pepsite score.We conclude that docking has only modest power to define the location of a peptide within a larger protein region known to contain it. However, this information can be used in training machine learning methods which may allow for the identification of peptide binding regions within a protein sequence.
  • Publication
    In silico approaches to predict the potential of milk protein-derived peptides as dipeptidyl peptidase IV (DPP-IV) inhibitors
    Molecular docking of a library of all 8000 possible tripeptides to the active site of DPP-IV was used to determine their binding potential. A number of tripeptides were selected for experimental testing, however, there was no direct correlation between the Vina score and their in vitro DPP-IV inhibitory properties. While Trp-Trp-Trp, the peptide with the best docking score, was a moderate DPP-IV inhibitor (IC50 216 μM), Lineweaver and Burk analysis revealed its action to be non-competitive. This suggested that it may not bind to the active site of DPP-IV as assumed in the docking prediction. Furthermore, there was no significant link between DPP-IV inhibition and the physicochemical properties of the peptides (molecular mass, hydrophobicity, hydrophobic moment (μH), isoelectric point (pI) and charge). LIGPLOTs indicated that competitive inhibitory peptides were predicted to have both hydrophobic and hydrogen bond interactions with the active site of DPP-IV. DPP-IV inhibitory peptides generally had a hydrophobic or aromatic amino acid at the N-terminus, preferentially a Trp for non-competitive inhibitors and a broader range of residues for competitive inhibitors (Ile, Leu, Val, Phe, Trp or Tyr). Two of the potent DPP-IV inhibitors, Ile-Pro-Ile and Trp-Pro (IC 50 values of 3.5 and 44.2 μM, respectively), were predicted to be gastrointestinally/intestinally stable. This work highlights the needs to test the assumptions (i.e. competitive binding) of any integrated strategy of computational and experimental screening, in optimizing screening. Future strategies targeting allosteric mechanisms may need to rely more on structure-activity relationship modeling, rather than on docking, in computationally selecting peptides for screening.
      37Scopus© Citations 99
  • Publication
    Genome-wide epistatic expression quantitative trait loci discovery in four human tissues reveals the importance of local chromosomal interactions governing gene expression
    Background: Epistasis (synergistic interaction) among SNPs governing gene expression is likely to arise withintranscriptional networks. However, the power to detect it is limited by the large number of combinations to betested and the modest sample sizes of most datasets. By limiting the interaction search space firstly to cis-trans andthen cis-cis SNP pairs where both SNPs had an independent effect on the expression of the most variabletranscripts in the liver and brain, we greatly reduced the size of the search space.Results: Within the cis-trans search space we discovered three transcripts with significant epistasis. Surprisingly, allinteracting SNP pairs were located nearby each other on the chromosome (within 290 kb-2.16 Mb). Despite theirproximity, the interacting SNPs were outside the range of linkage disequilibrium (LD), which was absent betweenthe pairs (r2 < 0.01). Accordingly, we redefined the search space to detect cis-cis interactions, where a cis-SNP waslocated within 10 Mb of the target transcript. The results of this show evidence for the epistatic regulation of 50transcripts across the tissues studied. Three transcripts, namely, HLA-G, PSORS1C1 and HLA-DRB5 share commonregulatory SNPs in the pre-frontal cortex and their expression is significantly correlated. This pattern of epistasis isconsistent with mediation via long-range chromatin structures rather than the binding of transcription factors intrans. Accordingly, some of the interactions map to regions of the genome known to physically interact inlymphoblastoid cell lines while others map to known promoter and enhancer elements. SNPs involved in interactionsappear to be enriched for promoter markers.Conclusions: In the context of gene expression and its regulation, our analysis indicates that the study of cis-cisor local epistatic interactions may have a more important role than interchromosomal interactions.
      712Scopus© Citations 4
  • Publication
    SLiMFinder : a web server to find novel, significantly over-represented, short protein motifs
    Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein–protein interactions. The Short, Linear Motif Finder (SLiMFinder) web server is a de novo motif discovery tool that identifies statistically over-represented motifs in a set of protein sequences, accounting for the evolutionary relationships between them. Motifs are returned with an intuitive P-value that greatly reduces the problem of false positives and is accessible to biologists of all disciplines. Input can be uploaded by the user or extracted directly from UniProt. Numerous masking options give the user great control over the contextual information to be included in the analyses. The SLiMFinder server combines these with user-friendly output and visualizations of motif context to allow the user to quickly gain insight into the validity of a putatively functional motif. These visualizations include alignments of motif occurrences, alignments of motifs and their homologues and a visual schematic of the top-ranked motifs. Returned motifs can also be compared with known SLiMs from the literature using CompariMotif. All results are available for download. The SLiMFinder server is available at:
      2091Scopus© Citations 54
  • Publication
    CycloPs : generating virtual libraries of cyclized and constrained peptides including nonnatural amino acids
    We introduce CycloPs, software for the generation of virtual libraries of constrained peptides including natural and nonnatural commercially available amino acids. The software is written in the cross-platform Python programming language, and features include generating virtual libraries in one-dimensional SMILES and three-dimensional SDF formats, suitable for virtual screening. The stand-alone software is capable of filtering the virtual libraries using empirical measurements, including peptide synthesizability by standard peptide synthesis techniques, stability, and the druglike properties of the peptide. The software and accompanying Web interface is designed to enable the rapid generation of large, structurally diverse, synthesizable virtual libraries of constrained peptides quickly and conveniently, for use in virtual screening experiments. The stand-alone software, and the Web interface for evaluating these empirical properties of a single peptide, are available at
      2454Scopus© Citations 30
  • Publication
    A novel approach of homozygous haplotype sharing identifies candidate genes in autism spectrum disorder
    Autism spectrum disorder (ASD) is a highly heritable disorder of complex and heterogeneous aetiology. It is primarily characterized by altered cognitive ability including impaired language and communication skills and fundamental deficits in social reciprocity. Despite some notable successes in neuropsychiatric genetics, overall, the high heritability of ASD (~90%) remains poorly explained by common genetic risk variants. However, recent studies suggest that rare genomic variation, in particular copy number variation, may account for a significant proportion of the genetic basis of ASD. We present a large scale analysis to identify candidate genes which may contain low-frequency recessive variation contributing to ASD while taking into account the potential contribution of population differences to the genetic heterogeneity of ASD. Our strategy, homozygous haplotype (HH) mapping, aims to detect homozygous segments of identical haplotype structure that are shared at a higher frequency amongst ASD patients compared to parental controls. The analysis was performed on 1,402 Autism Genome Project trios genotyped for 1 million single nucleotide polymorphisms (SNPs). We identified 25 known and 1,218 novel ASD candidate genes in the discovery analysis including CADM2, ABHD14A, CHRFAM7A, GRIK2, GRM3, EPHA3, FGF10, KCND2, PDZK1, IMMP2L and FOXP2. Furthermore, 10 of the previously reported ASD genes and 300 of the novel candidates identified in the discovery analysis were replicated in an independent sample of 1,182 trios. Our results demonstrate that regions of HH are significantly enriched for previously reported ASD candidate genes and the observed association is independent of gene size (odds ratio 2.10). Our findings highlight the applicability of HH mapping in complex disorders such as ASD and offer an alternative approach to the analysis of genome-wide association data.
      696Scopus© Citations 150
  • Publication
    Design and Evaluation of Antimalarial Peptides Derived from Prediction of Short Linear Motifs in Proteins Related to Erythrocyte Invasion
    The purpose of this study was to investigate the blood stage of the malaria causing parasite, Plasmodium falciparum, to predict potential protein interactions between the parasite merozoite and the host erythrocyte and design peptides that could interrupt these predicted interactions. We screened the P. falciparum and human proteomes for computationally predicted short linear motifs (SLiMs) in cytoplasmic portions of transmembrane proteins that could play roles in the invasion of the erythrocyte by the merozoite, an essential step in malarial pathogenesis. We tested thirteen peptides predicted to contain SLiMs, twelve of them palmitoylated to enhance membrane targeting, and found three that blocked parasite growth in culture by inhibiting the initiation of new infections in erythrocytes. Scrambled peptides for two of the most promising peptides suggested that their activity may be reflective of amino acid properties, in particular, positive charge. However, one peptide showed effects which were stronger than those of scrambled peptides. This was derived from human red blood cell glycophorin-B. We concluded that proteome-wide computational screening of the intracellular regions of both host and pathogen adhesion proteins provides potential lead peptides for the development of anti-malarial compounds.
      199Scopus© Citations 5
  • Publication
    Virtual Screening Using Combinatorial Cyclic Peptide Libraries Reveals Protein Interfaces Readily Targetable by Cyclic Peptides
    Protein–protein and protein–peptide interactions are responsible for the vast majority of biological functions in vivo, but targeting these interactions with small molecules has historically been difficult. What is required are efficient combined computational and experimental screening methods to choose among a number of potential protein interfaces worthy of targeting lead macrocyclic compounds for further investigation. To achieve this, we have generated combinatorial 3D virtual libraries of short disulfide-bonded peptides and compared them to pharmacophore models of important protein–protein and protein–peptide structures, including short linear motifs (SLiMs), protein-binding peptides, and turn structures at protein–protein interfaces, built from 3D models available in the Protein Data Bank. We prepared a total of 372 reference pharmacophores, which were matched against 108,659 multiconformer cyclic peptides. After normalization to exclude nonspecific cyclic peptides, the top hits notably are enriched for mimetics of turn structures, including a turn at the interaction surface of human α thrombin, and also feature several protein-binding peptides. The top cyclic peptide hits also cover the critical 'hot spot' interaction sites predicted from the interaction crystal structure. We have validated our method by testing cyclic peptides predicted to inhibit thrombin, a key protein in the blood coagulation pathway of important therapeutic interest, identifying a cyclic peptide inhibitor with lead-like activity. We conclude that protein interfaces most readily targetable by cyclic peptides and related macrocyclic drugs may be identified computationally among a set of candidate interfaces, accelerating the choice of interfaces against which lead compounds may be screened.
      910Scopus© Citations 10
  • Publication
    Towards the Improved Discovery and Design of Functional Peptides: Common Features of Diverse Classes Permit Generalized Prediction of Bioactivity
    The conventional wisdom is that certain classes of bioactive peptides have specific structural features that endow their particular functions. Accordingly, predictions of bioactivity have focused on particular subgroups, such as antimicrobial peptides. We hypothesized that bioactive peptides may share more general features, and assessed this by contrasting the predictive power of existing antimicrobial predictors as well as a novel general predictor, PeptideRanker, across different classes of peptides.We observed that existing antimicrobial predictors had reasonable predictive power to identify peptides of certain other classes i.e. toxin and venom peptides. We trained two general predictors of peptide bioactivity, one focused on short peptides (4-20 amino acids) and one focused on long peptides (>20 amino acids). These general predictors had performance that was typically as good as, or better than, that of specific predictors. We noted some striking differences in the features of short peptide and long peptide predictions, in particular, high scoring short peptides favour phenylalanine. This is consistent with the hypothesis that short and long peptides have different functional constraints, perhaps reflecting the difficulty for typical short peptides in supporting independent tertiary structure.We conclude that there are general shared features of bioactive peptides across different functional classes, indicating that computational prediction may accelerate the discovery of novel bioactive peptides and aid in the improved design of existing peptides, across many functional classes. An implementation of the predictive method, PeptideRanker, may be used to identify among a set of peptides those that may be more likely to be bioactive.
      2430Scopus© Citations 257
  • Publication
    Profile-based short linear protein motif discovery
    (BioMed Central, 2012-05-18) ;
    Background Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3-10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation. Results The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset. Conclusions Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.
      4574Scopus© Citations 14