Panah, Davoud ShariatDavoud ShariatPanahHines, AndrewAndrewHinesMcKeever, SusanSusanMcKeever2024-02-132024-02-132020 the A2020-12-081613-0073http://hdl.handle.net/10197/25432The 28th Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2020), Technical University Dublin, Ireland (held online due to coronavirus outbreak), 7-8 December 2020In the last few years, the automatic classification of heart sounds has been widely studied as a screening method for heart disease. Some of these studies have achieved high accuracies in heart abnormality prediction. However, for such models to assist clinicians in the detection of heart abnormalities, it is of critical importance that they are generalisable, working on unseen real-world data. Despite the importance of generalisability, the presence of bias in the leading heart sound datasets used in these studies has remained unexplored. In this paper, we explore the presence of potential bias in heart sound datasets. Using a small set of spectral features for heart sound representation, we demonstrate experimentally that it is possible to detect sub-datasets of PhysioNet, the leading dataset of the field, with 98% accuracy. We also show that sensors which have been used to capture recordings of each dataset are likely the main cause of the bias in these datasets. Lack of awareness of this bias works against generalised models for heart sound diagnostics. Our findings call for further research on the bias issue in heart sound datasets and its impact on the generalisability of heart abnormality prediction models.enBiasPhysioNetHeart soundMachine learningExploring Composite Dataset Biases for Heart Sound ClassificationConference Publication2021-01-26https://creativecommons.org/licenses/by-nc-nd/3.0/ie/