Options
On the predictability and prediction of protein abundances in tumours
Author(s)
Date Issued
2024
Date Available
2025-11-14T16:55:25Z
Abstract
mRNA abundances have been used as proxies for protein abundances for decades. However, the increasing availability of proteogenomic datasets has revealed only a moderate correlation between mRNA and protein abundances. Moreover, certain classes of proteins display higher or lower than average concordance between mRNA and protein abundances. In particular, proteins that function within complexes appear to have lower-than-average mRNA-protein correlations. While biological factors, such as post-transcriptional regulation, are often considered to play a substantial role in attenuating the mRNA-protein correlation, technical factors have received little attention. The two aims of this thesis are to i) explore the influence of technical factors on observed mRNA-protein correlations and ii) predict protein abundances from mRNA abundances using machine learning models.First, by analysing tumour and cancer cell line studies with mass-spectrometry-based replicate proteomic profiles, I show that proteins whose abundances can be reproducibly quantified tend to have higher observed mRNA-protein correlations. Furthermore, the reproducibility of measurements of individual proteins is found to be mostly consistent across different studies. We exploit this to develop an aggregated protein reproducibility score that explains, on average, ~14% of the variation in the mRNA-protein correlation of different studies. Second, through analysis of studies containing protein expression profiles quantified using an antibody-based technique (Reverse Phase Protein Arrays), I show that the reliability of antibodies influences the observed correlation between mRNA and protein abundances, i.e. proteins with reliable antibodies have a higher mRNA-protein correlation. These results suggest that technical factors contribute to attenuating the mRNA-protein correlations more than expected. Having identified reliably quantified proteins, I finally developed machine learning models to assess the predictability of protein abundances using latent features derived from mRNA abundances. I show that latent spaces, representing different aspects of the cell in lower dimensions, are better than the previously identified approaches to predicting protein abundances. However, in general, we observe that the predictability of proteins is poor. Overall, I show that technical factors substantially contribute to the variation in observed mRNA-protein correlations across proteins. Despite the better performance of latent features derived from mRNA abundances, the overall ability of machine learning models to predict protein abundances is limited.
Type of Material
Doctoral Thesis
Qualification Name
Doctor of Philosophy (Ph.D.)
Publisher
University College Dublin. School of Computer Science
Copyright (Published Version)
2024 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
Thesis - Revised.pdf
Size
3.87 MB
Format
Adobe PDF
Checksum (MD5)
929ecac82e8355ce5da0687cf406b544
Owning collection