Now showing 1 - 10 of 64
  • Publication
    Potential utility of docking to identify protein-peptide binding regions
    (University College Dublin. School of Computer Science and Informatics, 2013-05) ; ; ; ;
    Disordered regions of proteins often bind to structured domains, mediating interactions within and between proteins. However, it is difficult to identify a priori the short regions involved in binding. We set out to determine if docking peptides to peptide binding domains would assist in these predictions. First, we investigated the docking of known short peptides to their native and non-native peptide binding domains. We then investigated the docking of overlapping peptides adjacent to the native peptide. We found only weak discrimination of docking scores between native peptide and adjacent peptides in this context with similar results for both ordered and disordered regions. Finally, we trained a bidirectional recurrent neural network using as input the peptide sequence, predicted secondary structure, Vina docking score and Pepsite score.We conclude that docking has only modest power to define the location of a peptide within a larger protein region known to contain it. However, this information can be used in training machine learning methods which may allow for the identification of peptide binding regions within a protein sequence.
  • Publication
    Categorizing Compiler Error Messages with Principal Component Analysis
    Being a competent programmer is critical for students in all computing disciplines and software engineering in particular. Novice programming students face a number of challenges and these have been shown to contribute to worrying dropout rates for students majoring in computing, and the growing number of non-majors who are learning to program. Methods of identifying and helping at-risk programming students have been researched for decades. Much of this research focuses on categorizing the errors that novice programmers make, in order to help understand why these errors are made, with the goal of helping them overcome these errors quickly, or avoid them altogether. This paper presents the first known work on categorizing compiler errors using Principal Component Analysis. In this, we find a new way of discovering categories of related errors from data produced by the students in the course of their programming activity. This method may be used to identify where these students are struggling and provide direction in efforts to help them.
  • Publication
    De-repression of myelin-regulating gene expression after status epilepticus in mice lacking the C/EBP homologous protein CHOP
    The C/EBP homologous protein CHOP is normally present at low levels in cells but increases rapidly after insults such as DNA damage or endoplasmatic reticulum stress where it contributes to cellular homeostasis and apoptosis. By forming heterodimers with other transcription factors, CHOP can either act as a dominant-negative regulator of gene expression or to induce the expression of target genes. Recent work demonstrated that seizure-induced hippocampal damage is significantly worse in mice lacking CHOP and these animals go on to develop an aggravated epileptic phenotype. To identify novel CHOP-controlled target genes which potentially influence the epileptic phenotype, we performed a bioinformatics analysis of tissue microarrays from chop-deficient mice after prolonged seizures. GO analysis revealed genes associated with biological membranes were prominent among those in the chop-deficient array dataset and we identified myelin-associated genes to be particularly de-repressed. These data suggest CHOP might act as an inhibitor of myelin-associated processes in the brain and could be targeted to influence axonal regeneration or reorganisation.
  • Publication
    Multi-level Attention-Based Neural Networks for Distant Supervised Relation Extraction
    We propose a multi-level attention-based neural network forrelation extraction based on the work of Lin et al. to alleviate the problemof wrong labelling in distant supervision. In this paper, we first adoptgated recurrent units to represent the semantic information. Then, weintroduce a customized multi-level attention mechanism, which is expectedto reduce the weights of noisy words and sentences. Experimentalresults on a real-world dataset show that our model achieves significantimprovement on relation extraction tasks compared to both traditionalfeature-based models and existing neural network-based methods
  • Publication
    Sense of Belonging: The Intersectionality of Self-Identified Minority Status and Gender in Undergraduate Computer Science Students
    Creating inclusive learning environments for all students is of primary importance. Student sense of belonging is an important part of this. However, sense of belonging can show variations according to factors such as ethnicity and gender as well as influencing attributes such as motivation and persistence. We utilised a survey adapted from the "Math Sense of Belonging Scale"to examine the relationship between undergraduate computer science students' sense of belonging, gender identity, and self-declared minority status. We observed a lower sense of belonging in students who identified as women who also self-identified as being part of a minority group. However, students who identified as women who did not identify as belonging to a minority had a sense of belonging comparable to those identifying as men. Our results provide insight that may help us improve the sense of belonging of our undergraduate students, particularly those identifying as women and as belonging to a minority in computer science. It has also brought to our attention that action needs to be taken to mitigate the potentially disproportionately negative consequences that COVID-19 may have on these students due to reduced opportunities for social interaction and the negative impact that this has on sense of belonging.
      340Scopus© Citations 33
  • Publication
    CPPpred: prediction of cell penetrating peptides
    Summary: Cell penetrating peptides (CPPs) are attracting much attention as a means of overcoming the inherently poor cellular uptake of various bioactive molecules. Here, we introduce CPPpred, a web server for the prediction of CPPs using a N-to-1 neural network. The server takes one or more peptide sequences, between 5 and 30 amino acids in length, as input and returns a prediction of how likely each peptide is to be cell penetrating. CPPpred was developed with redundancy reduced training and test sets, offering an advantage over the only other currently available CPP prediction method.
      318Scopus© Citations 121
  • Publication
    In Silico Protein Motif Discovery and Structural Analysis
    A wealth of in silico tools is available for protein motif discovery and structural analysis. The aim of this chapter is to collect some of the most common and useful tools and to guide the biologist in their use. A detailed explanation is provided for the use of Distill, a suite of web servers for the prediction of protein structural features and the prediction of full-atom 3D models from a protein sequence. Besides this, we also provide pointers to many other tools available for motif discovery and secondary and tertiary structure prediction from a primary amino acid sequence. The prediction of protein intrinsic disorder and the prediction of functional sites and SLiMs are also briefly discussed. Given that user queries vary greatly in size, scope and character, the trade-offs in speed, accuracy and scale need to be considered when choosing which methods to adopt.
      104Scopus© Citations 1
  • Publication
    RNA-sequencing analysis of umbilical cord plasma microRNAs from healthy newborns
    MicroRNAs are a class of small non-coding RNA that regulate gene expression at a post-transcriptional level. MicroRNAs have been identified in various body fluids under normal conditions and their stability as well as their dysregulation in disease has led to ongoing interest in their diagnostic and prognostic potential. Circulating microRNAs may be valuable predictors of early-life complications such as birth asphyxia or neonatal seizures but there are relatively few data on microRNA content in plasma from healthy babies. Here we performed small RNA-sequencing analysis of plasma processed from umbilical cord blood in a set of healthy newborns. MicroRNA levels in umbilical cord plasma of four male and four female healthy babies, from two different centres were profiled. A total of 1,004 individual microRNAs were identified, which ranged from 426 to 659 per sample, of which 269 microRNAs were common to all eight samples. Many of these microRNAs are highly expressed and consistent with previous studies using other high throughput platforms. While overall microRNA expression did not differ between male and female cord blood plasma, we did detect differentially edited microRNAs in female plasma compared to male. Of note, and consistent with other studies of this type, adenylation and uridylation were the two most prominent forms of editing. Six microRNAs, miR-128-3p, miR-29a-3p, miR-9-5p, miR-218-5p, 204-5p and miR-132-3p were consistently both uridylated and adenylated in female cord blood plasma. These results provide a benchmark for microRNA profiling and biomarker discovery using umbilical cord plasma and can be used as comparative data for future biomarker profiles from complicated births or those with early-life developmental disorders.
      385Scopus© Citations 7
  • Publication
    Altered Biogenesis and MicroRNA Content of Hippocampal Exosomes Following Experimental Status Epilepticus
    Repetitive or prolonged seizures (status epilepticus) can damage neurons within the hippocampus, trigger gliosis, and generate an enduring state of hyperexcitability. Recent studies have suggested that microvesicles including exosomes are released from brain cells following stimulation and tissue injury, conveying contents between cells including microRNAs (miRNAs). Here, we characterized the effects of experimental status epilepticus on the expression of exosome biosynthesis components and analyzed miRNA content in exosome-enriched fractions. Status epilepticus induced by unilateral intra-amygdala kainic acid in mice resulted in acute subfield-specific, bi-directional changes in hippocampal transcripts associated with exosome biosynthesis including up-regulation of endosomal sorting complexes required for transport (ESCRT)-dependent and -independent pathways. Increased expression of exosome components including Alix were detectable in samples obtained 2 weeks after status epilepticus and changes occurred in both the ipsilateral and contralateral hippocampus. RNA sequencing of exosome-enriched fractions prepared using two different techniques detected a rich diversity of conserved miRNAs and showed that status epilepticus selectively alters miRNA contents. We also characterized editing sites of the exosome-enriched miRNAs and found six exosome-enriched miRNAs that were adenosine-to-inosine (ADAR) edited with the majority of the editing events predicted to occur within miRNA seed regions. However, the prevalence of these editing events was not altered by status epilepticus. These studies demonstrate that status epilepticus alters the exosome pathway and its miRNA content, but not editing patterns. Further functional studies will be needed to determine if these changes have pathophysiological significance for epileptogenesis.
      174Scopus© Citations 24
  • Publication
    SCL-Epred: A generalised de novo eukaryotic protein subcellular localisation predictor
    Knowledge of the subcellular location of a protein provides valuable information about its function, possible interaction with other proteins and drug targetability, among other things. The experimental determination of a protein's location in the cell is expensive, time consuming and open to human error. Fast and accurate predictors of subcellular location have an important role to play if the abundance of sequence data which is now available is to be fully exploited. In the post-genomic era, genomes in many diverse organisms are available. Many of these organisms are important in human and veterinary disease and fall outside of the well-studied plant, animal and fungi groups. We have developed a general eukaryotic subcellular localisation predictor (SCL-Epred) which predicts the location of eukaryotic proteins into three classes which are important, in particular, for determining the drug targetability of a protein - secreted proteins, membrane proteins and proteins that are neither secreted nor membrane. The algorithm powering SCL-Epred is a N-to-1 neural network and is trained on very large non-redundant sets of protein sequences. SCL-Epred performs well on training data achieving a Q of 86 % and a generalised correlation of 0.75 when tested in tenfold cross-validation on a set of 15,202 redundancy reduced protein sequences. The three class accuracy of SCL-Epred and LocTree2, and in particular a consensus predictor comprising both methods, surpasses that of other widely used predictors when benchmarked using a large redundancy reduced independent test set of 562 proteins. SCL-Epred is publicly available at
      223Scopus© Citations 8