Options
Underestimation Bias in Machine Learning
Author(s)
Date Issued
2024
Date Available
2025-11-14T16:27:51Z
Abstract
Bias in Machine Learning (ML), often associated with biased training data, leads to an assertion that ML algorithms only mirror existing biases instead of being inherently biased. This dissertation, however, explores the intrinsic bias in ML algorithms, specifically underestimation bias, arising due to limitations in the training data and constrained model capacity. This research underscores the fact that such bias is a pressing concern, as the fine-tuning of ML models predominantly focuses on accuracy, often disregarding aspects of bias and fairness. To mitigate underestimation bias, we introduce two classes of strategies: pre-processing and in-processing. Our pre-processing technique combines random perturbation and counterfactual oversampling to rectify dataset imbalances, aiming to augment the representation of the minority class, thereby enabling a model to learn effectively from both majority and minority groups. Meanwhile, the in-processing strategy employs multi-objective optimization, enhancing fairness without making any assumption about the optimal trade-off. This strategy enables us to derive a set of models, each reflecting different objective trade-offs. We present empirical evidence supporting the effectiveness of these methods in alleviating underestimation bias without compromising model performance. Moreover, we delve into the presence of underestimation bias in case-based reasoning (CBR) systems, suggesting a case-based maintenance (CBM) technique and an alternate metric learning strategy to alleviate bias. The CBM method pinpoints cases in the training data that instigate biased predictions, while the metric learning approach alleviates bias by learning a new, unbiased distance metric from the training data. We demonstrate the effectiveness of these methods in mitigating underestimation, noting the comparative transparency of the CBM technique over the more opaque metric learning strategy. This dissertation also examines the relationship between underestimation bias and other fairness metrics, including Equalized Odds (EO), bias amplification, and calibration. We find that while underestimation and equalized odds are related, they can differ under certain circumstances. This divergence is due to the underestimation being less tied to accuracy than EO. Regarding calibration, we show that underestimation is related to the group-based calibration property, as both demand that predictions accurately reflect actual data occurrences. However, unlike group-based calibration that aligns with a classifier's probability outputs, underestimation concentrates solely on precisely matching the hard class label with its actual distribution in the data.
Type of Material
Doctoral Thesis
Qualification Name
Doctor of Philosophy (Ph.D.)
Publisher
University College Dublin. School of Computer Science
Copyright (Published Version)
2024 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
PhD_Thesis_of_William_Blanzeisky__Revised (1).pdf
Size
6.88 MB
Format
Adobe PDF
Checksum (MD5)
c71a2ce24e80b82ce9d0638368dc6064
Owning collection