GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

Files in This Item:
File Description SizeFormat 
insight_publication.pdf1.54 MBAdobe PDFDownload
Title: GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
Authors: Rue-Albrecht, Kévin
McGettigan, Paul A.
Hernández, Belinda
Nalpas, Nicholas C.
Magee, David A.
Parnell, Andrew C.
Gordon, Stephen V.
MacHugh, David E.
Permanent link: http://hdl.handle.net/10197/7882
Date: 11-Mar-2016
Abstract: Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.
Funding Details: Department of Agriculture, Food and the Marine
European Commission - Seventh Framework Programme (FP7)
Science Foundation Ireland
University College Dublin
Type of material: Journal Article
Publisher: BioMed Central
Journal: BMC Bioinformatics
Volume: 17
Issue: 126
Start page: 1
End page: 12
Copyright (published version): 2016 the Authors
Keywords: Machine learningStatisticsGene expressionGene ontologySupervised learningClassificationMicroarrayRNA-sequencing
DOI: 10.1186/s12859-016-0971-3
Language: en
Status of Item: Peer reviewed
metadata.dc.date.available: 2016-09-06T12:43:17Z
Appears in Collections:Conway Institute Research Collection
Mathematics and Statistics Research Collection
Insight Research Collection
Agriculture and Food Science Research Collection
Veterinary Medicine Research Collection

Show full item record

Citations 50

Last Week
Last month
checked on Dec 12, 2018

Google ScholarTM



This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.