Now showing 1 - 10 of 64
  • Publication
    Multi-level Attention-Based Neural Networks for Distant Supervised Relation Extraction
    We propose a multi-level attention-based neural network forrelation extraction based on the work of Lin et al. to alleviate the problemof wrong labelling in distant supervision. In this paper, we first adoptgated recurrent units to represent the semantic information. Then, weintroduce a customized multi-level attention mechanism, which is expectedto reduce the weights of noisy words and sentences. Experimentalresults on a real-world dataset show that our model achieves significantimprovement on relation extraction tasks compared to both traditionalfeature-based models and existing neural network-based methods
      272
  • Publication
    Inspiring through videos: Role Models in pSTEM - You can be what you can see
    This research utilises videos of 10 female role models, representing a range of age-groups and backgrounds, who have pursued or are working in the pSTEM fields of physics, mathematics, computer science and engineering. Based on the ecological framework of factors influencing girls’ and women’s’ participation in STEM (UNESCO, 2017), the videos showcase the backgrounds and influences of each of the role models and highlight what they enjoy about what they do. In their conversations, the role models also identify challenges they may have faced in their careers, such as being the only woman in the room, and also share advice on overcoming similar issues. The premise of the initiative is based on research demonstrating that female role models in STEM subjects can mitigate negative stereotypes and offer girls an authentic understanding of a career in STEM (McPherson, Banchefsky, & Park, 2018). By SMEC 2022, all 10 videos will have been produced, with accompanying teacher materials that will be freely downloadable for use in schools. In the next phase of the research, we will investigate the impact of utilising these videos in post-primary classrooms on students’ perceptions of and attitudes towards STEM, specifically focusing on the pSTEM subjects. We will also investigate teachers’ attitudes towards pSTEM and evaluate their feedback of the materials in order to further develop these resources. Participating schools (n = 10) will view the videos with a class-group over a number of weeks, utilising the educative materials. Class discussions will be based on areas such as: Growth Mindset, Mathematical Anxiety, Cultural Messages, Sense of Belonging and Unconscious Bias. Data will be generated based on research by (McKinney, Sexton, & Meyerson) and will utilise both quantitative and qualitative data, including surveys, focus-groups and semi-structured interviews. Findings will inform further research based on making STEM more inclusive and contribute to initiatives attempting to lessen the gender gap in pSTEM subjects at post-primary and undergraduate level. The ‘Lightning Talk’ at SMEC will discuss the evolution of the project, the construction of the interviews for the role model videos, selection of the role models, and pilot feedback from teachers and students. Feedback from SMEC attendees will be welcomed and incorporated into the next phase of the research design.
      86
  • Publication
    Prediction of pathological response to neo‐adjuvant chemoradiotherapy for oesophageal cancer using vibrational spectroscopy
    In oesophageal cancer (OC) neo‐adjuvant chemoradiotherapy (neoCRT) is used to debulk tumour size prior to surgery, with a complete pathological response (pCR) observed in approximately ∼30% of patients. Presently no predictive quantitative methodology exists which can predict response, in particular a pCR or major response (MR), in patients prior to therapy. Raman and Fourier transform infrared imaging were performed on OC tissue specimens acquired from 50 patients prior to therapy, to develop a computational model linking spectral data to treatment outcome. Modelling sensitivities and specificities above 85% were achieved using this approach. Parallel in‐vitro studies using an isogenic model of radioresistant OC supplied further insight into OC cell spectral response to ionising radiation where a potential spectral biomarker of radioresistance was observed at 977 cm−1. This work demonstrates that chemical imaging may provide an option for triage of patients prior to neoCRT treatment allowing more precise prescription of treatment.
      190
  • Publication
    Prediction of quality of life in people with ALS: on the road towards explainable clinical decision support
    Amyotrophic Lateral Sclerosis (ALS) is a rare neurodegenerative disease that causes a rapid decline in motor functions and has a fatal trajectory. ALS is currently incurable, so the aim of the treatment is mostly to alleviate symptoms and improve quality of life (QoL) for the patients. The goal of this study is to develop a Clinical Decision Support System (CDSS) to alert clinicians when a patient is at risk of experiencing low QoL. The source of data was the Irish ALS Registry and interviews with the 90 patients and their primary informal caregiver at three time-points. In this dataset, there were two different scores to measure a person's overall QoL, based on the McGill QoL (MQoL) Questionnaire and we worked towards the prediction of both. We used Extreme Gradient Boosting (XGBoost) for the development of the predictive models, which was compared to a logistic regression baseline model. Additionally, we used Synthetic Minority Over-sampling Technique (SMOTE) to examine if that would increase model performance and SHAP (SHapley Additive explanations) as a technique to provide local and global explanations to the outputs as well as to select the most important features. The total calculated MQoL score was predicted accurately using three features - age at disease onset, ALSFRS-R score for orthopnoea and the caregiver's status pre-caregiving - with a F1-score on the test set equal to 0.81, recall of 0.78, and precision of 0.84. The addition of two extra features (caregiver's age and the ALSFRS-R score for speech) produced similar outcomes (F1-score 0.79, recall 0.70 and precision 0.90).
      72
  • Publication
    SCL-Epred: A generalised de novo eukaryotic protein subcellular localisation predictor
    Knowledge of the subcellular location of a protein provides valuable information about its function, possible interaction with other proteins and drug targetability, among other things. The experimental determination of a protein's location in the cell is expensive, time consuming and open to human error. Fast and accurate predictors of subcellular location have an important role to play if the abundance of sequence data which is now available is to be fully exploited. In the post-genomic era, genomes in many diverse organisms are available. Many of these organisms are important in human and veterinary disease and fall outside of the well-studied plant, animal and fungi groups. We have developed a general eukaryotic subcellular localisation predictor (SCL-Epred) which predicts the location of eukaryotic proteins into three classes which are important, in particular, for determining the drug targetability of a protein - secreted proteins, membrane proteins and proteins that are neither secreted nor membrane. The algorithm powering SCL-Epred is a N-to-1 neural network and is trained on very large non-redundant sets of protein sequences. SCL-Epred performs well on training data achieving a Q of 86 % and a generalised correlation of 0.75 when tested in tenfold cross-validation on a set of 15,202 redundancy reduced protein sequences. The three class accuracy of SCL-Epred and LocTree2, and in particular a consensus predictor comprising both methods, surpasses that of other widely used predictors when benchmarked using a large redundancy reduced independent test set of 562 proteins. SCL-Epred is publicly available at http://distillf.ucd.ie/distill/.
      206Scopus© Citations 8
  • Publication
    Beyond the twilight zone : automated prediction of structural properties of proteins by recursive neural networks and remote homology information
    (Wiley InterScience, 2009-10) ;
    The prediction of 1D structural properties of proteins is an important step toward the prediction of protein structure and function, not only in the ab initio case but also when homology information to known structures is available. Despite this the vast majority of 1D predictors do not incorporate homology information into the prediction process. We develop a novel structural alignment method, SAMD, which we use to build alignments of putative remote homologues that we compress into templates of structural frequency profiles. We use these templates as additional input to ensembles of recursive neural networks, which we specialise for the prediction of query sequences that show only remote homology to any Protein Data Bank structure. We predict four 1D structural properties – secondary structure, relative solvent accessibility, backbone structural motifs, and contact density. Secondary structure prediction accuracy, tested by five-fold cross-validation on a large set of proteins allowing less than 25% sequence identity between training and test set and query sequences and templates, exceeds 82%, outperforming its ab initio counterpart, other state-of-the-art secondary structure predictors (Jpred 3 and PSIPRED) and two other systems based on PSI-BLAST and COMPASS templates. We show that structural information from homologues improves prediction accuracy well beyond the Twilight Zone of sequence similarity, even below 5% sequence identity, for all four structural properties. Significant improvement over the extraction of structural information directly from PDB templates suggests that the combination of sequence and template information is more informative than templates alone.
    Scopus© Citations 39  704
  • Publication
    Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks
    Background: Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure. Results: We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that C trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of C traces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious. Conclusion: Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the- art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url http://distill.ucd.ie/.
    Scopus© Citations 42  431
  • Publication
    Sense of Belonging of Undergraduate Computing Students: A Comparative Analysis of University Entry Routes
    Sense of Belonging (SoB) is an individual’s personal conviction as to their acceptance as a valued member of an academic community. The importance of SoB lies in correlations with motivation, persistence, and other outcomes. However, SoB is subject to variations influenced by factors such as race/ethnicity and gender. We examine the impact that entry route into university has on SoB by comparing that of students who entered our College of Science, including the School of Computer Science, via the traditional school leaving route or one of several alternative access routes.
      12
  • Publication
    De Novo Protein Subcellular Localization Prediction by N-to-1 Neural Networks
    Knowledge of the subcellular location of a protein provides valuable information about its function and possible interaction with other proteins. In the post-genomic era, fast and accurate predictors of subcellular location are required if this abundance of sequence data is to be fully exploited. We have developed a subcellular localization predictor (SCL pred) which predicts the location of a protein into four classes for animals and fungi and five classes for plants (secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast) using high throughput machine learning techniques trained on large non-redundant sets of protein sequences. The algorithm powering SCL pred is a novel Neural Network (N-to-1 Neural Network, or N1-NN) which is capable of mapping whole sequences into single properties (a functional class, in this work) without resorting to predefined transformations, but rather by adaptively compressing the sequence into a hidden feature vector. We benchmark SCL pred against other publicly available predictors using two benchmarks including a new subset of Swiss-Prot release 57. We show that SCL pred compares favourably to the other state-of-the-art predictors. Moreover, the N1-NN algorithm is fully general and may be applied to a host of problems of similar shape, that is, in which a whole sequence needs to be mapped into a fixed-size array of properties, and the adaptive compression it operates may even shed light on the space of protein sequences.
    Scopus© Citations 3  157
  • Publication
    Development of an Explainable Clinical Decision Support System for the Prediction of Patient Quality of Life in Amyotrophic Lateral Sclerosis
    Amyotrophic Lateral Sclerosis (ALS) is a rare neurodegenerative and currently incurable disease. It causes a rapid decline in motor functions and has a fatal trajectory. The aim of the treatment is mostly to alleviate symptoms and improve the patient’s quality of life (QoL). The goal of this study is to develop a Clinical Decision Support System (CDSS) in order to alert clinicians when a patient is at risk of experiencing a low QoL, so that they are better supported. The source of the data was the Irish ALS Registry and interviews with the 90 patients and their primary informal caregiver at three time-points. In this dataset, there were two different scores to measure a person’s overall QoL, based on the McGill QoL (MQoL) Questionnaire and we worked towards the prediction of both. The method we used for the development of the predictive models was Extreme Gradient Boosting (XGBoost), which was compared to a logistic regression baseline model. We used the SHAP (SHapley Additive exPlanations) values as a technique to provide local and global explanations to the outputs as well as to select the most important features. The total calculated MQoL score was predicted accurately by three features, with a F1-score on the test set equal to 0.81, a recall score of 0.78, and a precision score of 0.84, while, the addition of two features produced similar outcomes (0.79, 0.70 and 0.90 respectively). The three most important features were the age at disease onset, ALSFRS score for orthopnoea and the caregiver’s status pre-caregiving.
      49Scopus© Citations 4