Options
Development of an Analysis Pipeline to Identify Mutations of Interest in SARS-CoV-2 Nucleocapsid Protein
Author(s)
Date Issued
2023
Date Available
2025-11-06T16:30:26Z
Abstract
With the persistence of the COVID-19 pandemic and the global spread of SARS-CoV-2 variants of concern (VOCs), the current paradigm of surveillance and vaccination strategy has focused mainly on the rapidly evolving viral spike (S) protein due to its role in immune escape and viral transmission. Mutations in other SARS-CoV-2 proteins could also be significant for viral evasion, transmissibility, replication, and pathogenesis. Of particular interest is the nucleocapsid protein (NP), involved at multiple stages of the viral life cycle including genomic RNA (gRNA) entry, gRNA replication, and virion assembly. With the millions of sequences shared on SARS-CoV-2 database repositories such as GISAID and NCBI, the sampling of sequences for mutational analysis proves challenging, due to differences in the number of sequences uploaded, sequencing qualities, and circulating variants across different geographical regions. Here, we outline a bioinformatics-based approach to identify and characterise amino acid mutations of interest in SARS-CoV-2 NP from sequences obtained from GISAID. To ensure an inclusive analysis, we used custom R scripts to first filter complete, high coverage sequences and include the six geographical regions (Africa, Asia, Europe, North America, Oceania, and South America). From the initial dataset retrieved from GISAID (n=10,199,092 on 2022-04-13), we constructed a High-Quality Representative (HQR) dataset (n=3,051,084) which represented multiple SARS-CoV-2 variants and epidemiological time periods from the six geographical regions. NP mutational analysis of the HQR dataset revealed 43 High-frequency Mutations (HFMs) along the NP structural domains. We found that the HFMs were associated with and characteristic to groups of VOCs and VOIs and outline their potential impact on NP SUMOylation and phosphorylation, post-translational modifications (PTMs) involved in NP dimerization and self-assembly. Importantly, the mutation detection pipeline developed here and the compiled HQR dataset, can be adapted to conduct further analyses on other parts of the SARS-CoV-2 genome.
Type of Material
Master Thesis
Qualification Name
Master of Science (M.Sc.)
Publisher
University College Dublin. School of Medicine
Copyright (Published Version)
2023 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
TA_Thesis_16340536.pdf
Size
18.07 MB
Format
Adobe PDF
Checksum (MD5)
c102e10c9836ad10c45bdabc174788bb
Owning collection