Options
Development and validation of colorectal cancer risk prediction tools: A comparison of models
Date Issued
2023-10
Date Available
2024-06-20T11:59:48Z
Abstract
Background: Identification of individuals at elevated risk can improve cancer screening programmes by permitting risk-adjusted screening intensities. Previous work introduced a prognostic model using sex, age and two preceding faecal haemoglobin concentrations to predict the risk of colorectal cancer (CRC) in the next screening round. Using data of 3 screening rounds, this model attained an area under the receiver-operating-characteristic curve (AUC) of 0.78 for predicting advanced neoplasia (AN). We validated this existing logistic regression (LR) model and attempted to improve it by applying a more flexible machine-learning approach. Methods: We trained an existing LR and a newly developed random forest (RF) model using updated data from 219,257 third-round participants of the Dutch CRC screening programme until 2018. For both models, we performed two separate out-of-sample validations using 1,137,599 third-round participants after 2018 and 192,793 fourth-round participants from 2020 onwards. We evaluated the AUC and relative risks of the predicted high-risk groups for the outcomes AN and CRC. Results: For third-round participants after 2018, the AUC for predicting AN was 0.77 (95% CI: 0.76–0.77) using LR and 0.77 (95% CI: 0.77–0.77) using RF. For fourth-round participants, the AUCs were 0.73 (95% CI: 0.72–0.74) and 0.73 (95% CI: 0.72–0.74) for the LR and RF models, respectively. For both models, the 5% with the highest predicted risk had a 7-fold risk of AN compared to average, whereas the lowest 80% had a risk below the population average for third-round participants. Conclusion: The LR is a valid risk prediction method in stool-based screening programmes. Although predictive performance declined marginally, the LR model still effectively predicted risk in subsequent screening rounds. An RF did not improve CRC risk prediction compared to an LR, probably due to the limited number of available explanatory variables. The LR remains the preferred prediction tool because of its interpretability.
Type of Material
Journal Article
Publisher
Elsevier
Journal
International Journal of Medical Informatics
Volume
178
Start Page
1
End Page
8
Copyright (Published Version)
2023 The Authors
Language
English
Status of Item
Peer reviewed
ISSN
1386-5056
This item is made available under a Creative Commons License
File(s)
Loading...
Name
Mulder et al (2023) Development and validation of colorectal cancer risk prediction tools; A comparison of models; International Journal of Medical Informatics.pdf
Size
1.61 MB
Format
Adobe PDF
Checksum (MD5)
ef1cb18412fbac685cdbf88d0618d361
Owning collection