Options
Subcellular Localization Prediction by Deep N-to-1 Convolutional Neural Networks
Author(s)
Date Issued
2025
Date Available
2025-11-14T14:28:08Z
Abstract
The subcellular location of a protein provides valuable insights to bioinformaticians in terms of drug designs and discovery, genomics, and various other aspects of medical research. Experimental methods for protein subcellular localization determination are labour-intensive, costly, and time-consuming whereas computational methods, if accurate, would represent a much more efficient alternative. This thesis introduces an ab initio protein subcellular localization predictor SCLpredECL powered by an ensemble of Deep N-to-1 Convolutional Neural Networks in eukaryotic organisms. Our predictor SCLpredECL is trained and tested on strict redundancy-reduced data sets for stricter homology reduction and used an in-lab encoding scheme that has led to significant performance improvements in similar predictive tasks. SCLpredECL predicts eukaryotic proteins based on eight classes (Other, Cytoplasm, Golgi apparatus, Membrane, Mitochondrion, Nucleus, Plastid, and Secreted). The results are measured in 5-fold cross-validation. The number of correctly predicted sequences by SCLpredECL is 63% for training, independent test and validation set. SCLpredECL is extensively tested for hyper-parameter and model selection. Hyper-parameters tuning is adopted through a grid search approach for fine-tuning. This predictor is a step towards bridging the gap between a protein sequence and the protein function. In this thesis, we expanded our research to explore the prediction accuracy of SCLpredECL without alignments as well. They are crucial, as they enable accurate predictions and analyses in various applications, including protein subcellular localization. We tested configurations of Deep N-to-1 Convolutional Neural Networks of various depths and width during experimentation for the evaluation of better preforming values across a diverse set of eight classes without adding alignments. We conducted a comprehensive evaluation of SCLpredECL both with and without alignments to assess its prediction accuracy under varying conditions. The average difference in peak performance between models with alignments and without alignments is approximately 15.82%.The average difference in the highest accuracy achieved with alignments compared to without alignments is approximately 15.16%. Thus, extensive experimentation indicate that the higher accuracy with alignments implies a more reliable model and better prediction accuracy, which can be trusted to deliver consistent performance across different layers and classes of subcellular localization predictions. This dual approach offers valuable insights into how alignments influence prediction accuracy and enhances the robustness of SCLpredECL by potentially reducing the need for extensive experimental validations. Furthermore, we expanded our research to include an analysis using the DOME Registry. DOME Registry is an innovative database designed to facilitate the standardization and reporting of machine learning methodologies in biology. The DOME Registry aims to bridge the gap between bioinformatics and machine learning by ensuring that published studies adhere to standardized reporting guidelines. This effort is crucial for the advancement of the field, as it allows for more accurate comparisons and evaluations of different methodologies. The primary goal of the DOME Registry is to standardize the terminology and technical details reported in bioinformatics studies. This standardization helps in creating a uniform framework that researchers can rely on to evaluate and compare various methods with better Reproducibility. After regressive interpretation of DOME, it is thus concluded that, it is essential to make adjustments in the review framework that recognize the varying importance of these criteria across different studies. Such modifications could significantly enhance both the precision and the reliability of the registry's evaluations, ensuring that each study is judged fairly and on relevant merits.
Type of Material
Doctoral Thesis
Qualification Name
Doctor of Philosophy (Ph.D.)
Publisher
University College Dublin. School of Computer Science
Copyright (Published Version)
2025 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
Maryam Gillani 19202770.pdf
Size
2.27 MB
Format
Adobe PDF
Checksum (MD5)
0877b31027edf53f0e3c52c945d626ef
Owning collection