Computer Science Research Collection

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 1015
  • Publication
    Benford's Law: Hammering a Square Peg into a Round Hole?
    Many authors have discussed the reasons why Benford's distribution for the most significant digits is seemingly so widespread. However the discussion is not settled because there is no theorem explaining its prevalence, in particular for naturally occurring scale-invariant data. Here we review Benford's distribution for continuous random variables under scale invariance. The implausibility of strict scale invariance leads us to a generalisation of Benford's distribution based on Pareto variables. This new model is more realistic, because real datasets are more prone to complying with a relaxed, rather than strict, definition of scale invariance. We also argue against forensic detection tests based on the distribution of the most significant digit. To show the arbitrariness of these tests, we give discrete distributions of the first coefficient of a continued fraction which hold in the exact same conditions as Benford's distribution and its generalisation.
  • Publication
    In Silico Protein Motif Discovery and Structural Analysis
    A wealth of in silico tools is available for protein motif discovery and structural analysis. The aim of this chapter is to collect some of the most common and useful tools and to guide the biologist in their use. A detailed explanation is provided for the use of Distill, a suite of web servers for the prediction of protein structural features and the prediction of full-atom 3D models from a protein sequence. Besides this, we also provide pointers to many other tools available for motif discovery and secondary and tertiary structure prediction from a primary amino acid sequence. The prediction of protein intrinsic disorder and the prediction of functional sites and SLiMs are also briefly discussed. Given that user queries vary greatly in size, scope and character, the trade-offs in speed, accuracy and scale need to be considered when choosing which methods to adopt.
      25Scopus© Citations 1
  • Publication
    XGboost-based Method for Seizure Detection in Mouse Models of Epilepsy
    Epilepsy is a chronic neurological disease which affects over 50 million people worldwide [1], caused by the disruption of the finely tuned inhibitory and excitatory balance in brain networks, manifesting clinically as seizures. Electroencephalographic (EEG) monitoring in rodent disease models of epilepsy is critical in the understanding of disease mechanisms and the development of anti-seizure drugs. However, the visual annotation of EEG traces is time-consuming, and is complicated by different models and seizure types. Automated annotation systems can help to solve these problems by reducing expert annotation time and increasing the throughput and reliability of seizure quantification. As machine learning is becoming increasingly popular for modelling sequential signals such as EEG, several researchers have tried machine learning to detect seizures in EEG traces from mouse models of epilepsy. Most existing work [2], [3] can only detect seizures in single mouse models of epilepsy and research on multiple mouse models has been limited to-date.
  • Publication
    Epileptic Seizure Detection in Clinical EEGs Using an XGboost-based Method
    (IEEE, 2020-12-05) ;
    Epilepsy is one of the most common serious disorders of the brain, affecting about 50 million people worldwide. Electroencephalography (EEG) is an electrophysiological monitoring method which is used to measure tiny electrical changes of the brain, and it is frequently used to diagnose epilepsy. However, the visual annotation of EEG traces is time-consuming and typically requires experienced experts. Therefore, automatic seizure detection can help to reduce the time required to annotate EEGs. Automatic detection of seizures in clinical EEGs has been limited-to date. In this study, we present an XGBoost-based method to detect seizures in EEGs from the TUH-EEG Corpus. 4,597 EEG files were used to train the method, 1,013 EEGs were used as a validation set, and 1,026 EEG files were used to test the method. Sixty-four features were selected as the input to the training set, and Synthetic Minority Over-sampling Technique was used to balance the dataset. Our XGBoost-based method achieved sensitivity and false alarm/24 hours of 20.00% and 15.59, respectively, in the test set. The proposed XGBoost-based method has the potential to help researchers automatically analyse seizures in clinical EEG recordings.
      16Scopus© Citations 6
  • Publication
    Prediction of quality of life in people with ALS: on the road towards explainable clinical decision support
    Amyotrophic Lateral Sclerosis (ALS) is a rare neurodegenerative disease that causes a rapid decline in motor functions and has a fatal trajectory. ALS is currently incurable, so the aim of the treatment is mostly to alleviate symptoms and improve quality of life (QoL) for the patients. The goal of this study is to develop a Clinical Decision Support System (CDSS) to alert clinicians when a patient is at risk of experiencing low QoL. The source of data was the Irish ALS Registry and interviews with the 90 patients and their primary informal caregiver at three time-points. In this dataset, there were two different scores to measure a person's overall QoL, based on the McGill QoL (MQoL) Questionnaire and we worked towards the prediction of both. We used Extreme Gradient Boosting (XGBoost) for the development of the predictive models, which was compared to a logistic regression baseline model. Additionally, we used Synthetic Minority Over-sampling Technique (SMOTE) to examine if that would increase model performance and SHAP (SHapley Additive explanations) as a technique to provide local and global explanations to the outputs as well as to select the most important features. The total calculated MQoL score was predicted accurately using three features - age at disease onset, ALSFRS-R score for orthopnoea and the caregiver's status pre-caregiving - with a F1-score on the test set equal to 0.81, recall of 0.78, and precision of 0.84. The addition of two extra features (caregiver's age and the ALSFRS-R score for speech) produced similar outcomes (F1-score 0.79, recall 0.70 and precision 0.90).