Options
Quantifying Fairness and Performance Trade-offs in Bias-Mitigated Machine Learning and Foundation Models
Author(s)
Date Issued
2026
Date Available
2026-02-06T12:18:52Z
Abstract
Machine learning and foundation models increasingly shape decision-making in sensitive, high-stakes domains. While bias-mitigation techniques aim to reduce bias in models, they can produce unintended side effects, redistributing harm rather than eliminating it. This thesis investigates the impacts of bias mitigation in both predictive machine learning and foundation models across four experiments. The first study develops a meta-classifier pipeline to systematically identify cohorts of individuals negatively impacted by bias-mitigation strategies applied to predictive machine learning models and structured tabular data. The findings reveal that all bias-mitigation approaches negatively affect a non-trivial ratio of individuals. The second experiment adapts the meta-classifier pipeline to foundation models, revealing similar subgroup-level harms to the first study while utilising unstructured textual data. The third experiment evaluates the trade-offs between fairness and performance in foundation models using a socially diverse fairness question-answering benchmark, finding that bias-mitigation efforts improve fairness in targeted dimensions during bias mitigation while degrading it in non-targeted dimensions and yield reduced performance. The fourth study probes changes in the embedding spaces of bias-mitigated foundation models, demonstrating that bias-mitigation techniques reshape internal representations and weaken gender-occupation associations. Together, these findings challenge the assumption that fairness gains are universally beneficial, and the accompanying experiments can be used as a basis for developing granular evaluation tools to identify subtle harms introduced by debiasing strategies. To support further evaluations and deepen our understanding of the impacts of bias-mitigation algorithms, this thesis introduces and makes publicly available a dataset for investigating gender bias in the embedding space of decoder-only foundation models.
Type of Material
Master Thesis
Qualification Name
Master of Science (M.Sc.)
Publisher
University College Dublin. School of Computer Science
Copyright (Published Version)
2026 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
17712081_nizhnichenkov_msc_thesis.pdf
Size
4.07 MB
Format
Adobe PDF
Checksum (MD5)
1c50351c3f8eeb210345959d705bf846
Owning collection