Controlled Model Debiasing through Minimal and Interpretable Updates
Di Gennaro, Federico, Laugel, Thibault, Grari, Vincent, Detyniecki, Marcin
Traditional approaches to learning fair machine learning models often require rebuilding models from scratch, generally without accounting for potentially existing previous models. In a context where models need to be retrained frequently, this can lead to inconsistent model updates, as well as redundant and costly validation testing. To address this limitation, we introduce the notion of controlled model debiasing, a novel supervised learning task relying on two desiderata: that the differences between new fair model and the existing one should be (i) interpretable and (ii) minimal. After providing theoretical guarantees to this new problem, we introduce a novel algorithm for algorithmic fairness, COMMOD, that is both model-agnostic and does not require the sensitive attribute at test time. In addition, our algorithm is explicitly designed to enforce (i) minimal and (ii) interpretable changes between biased and debiased predictions--a property that, while highly desirable in high-stakes applications, is rarely prioritized as an explicit objective in fairness literature. Our approach combines a concept-based architecture and adversarial learning and we demonstrate through empirical results that it achieves comparable performance to state-of-the-art debiasing methods while performing minimal and interpretable prediction changes. 1 Introduction The increasing adoption of machine learning models in high-stakes domains--such as criminal justice (Klein-berg et al., 2016) and credit lending (Bruckner, 2018)--has raised significant concerns about the potential biases that these models may reproduce and amplify, particularly against historically marginalized groups. Recent public discourse, along with regulatory developments such as the European AI Act (2024/1689), has further underscored the need for adapting AI systems to ensure fairness and trustworthiness (Bringas Col-menarejo et al., 2022). Consequently, many of the machine learning models deployed by organizations are, or may soon be, subject to these emerging regulatory requirements. Yet, such organizations frequently invest significant resources (e.g. The field of algorithmic fairness has experienced rapid growth in recent years, with numerous bias mitigation strategies proposed (Romei & Ruggieri, 2014; Mehrabi et al., 2021). These approaches can be broadly categorized into three types: pre-processing (e.g.,(Belrose et al., 2024)), in-processing (e.g.,(Zhang et al., 2018)), and post-processing(e.g., (Kamiran et al., 2010)), based on the stage of the machine learning pipeline at which fairness is enforced. While the two former categories do not account at all for any pre-existing biased model being available for the task, post-processing approaches aim to impose fairness by directly modifying the predictions of a biased classifier.
Feb-28-2025
- Country:
- Europe
- France (0.14)
- Poland (0.14)
- Switzerland (0.14)
- Europe
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.46)
- Industry:
- Law (1.00)
- Technology: