Goto

Collaborating Authors

 fairhome


Diversity Drives Fairness: Ensemble of Higher Order Mutants for Intersectional Fairness of Machine Learning Software

Chen, Zhenpeng, Li, Xinyue, Zhang, Jie M., Sarro, Federica, Liu, Yang

arXiv.org Artificial Intelligence

Intersectional fairness is a critical requirement for Machine Learning (ML) software, demanding fairness across subgroups defined by multiple protected attributes. This paper introduces FairHOME, a novel ensemble approach using higher order mutation of inputs to enhance intersectional fairness of ML software during the inference phase. Inspired by social science theories highlighting the benefits of diversity, FairHOME generates mutants representing diverse subgroups for each input instance, thus broadening the array of perspectives to foster a fairer decision-making process. Unlike conventional ensemble methods that combine predictions made by different models, FairHOME combines predictions for the original input and its mutants, all generated by the same ML model, to reach a final decision. Notably, FairHOME is even applicable to deployed ML software as it bypasses the need for training new models. We extensively evaluate FairHOME against seven state-of-the-art fairness improvement methods across 24 decision-making tasks using widely adopted metrics. FairHOME consistently outperforms existing methods across all metrics considered. On average, it enhances intersectional fairness by 47.5%, surpassing the currently best-performing method by 9.6 percentage points.


FairHome: A Fair Housing and Fair Lending Dataset

Bagalkotkar, Anusha, Karmakar, Aveek, Arnson, Gabriel, Linda, Ondrej

arXiv.org Artificial Intelligence

We present a Fair Housing and Fair Lending dataset (FairHome): A dataset with around 75,000 examples across 9 protected categories. To the best of our knowledge, FairHome is the first publicly available dataset labeled with binary labels for compliance risk in the housing domain. We demonstrate the usefulness and effectiveness of such a dataset by training a classifier and using it to detect potential violations when using a large language model (LLM) in the context of real-estate transactions. We benchmark the trained classifier against state-of-the-art LLMs including GPT-3.5, GPT-4, LLaMA-3, and Mistral Large in both zero-shot and fewshot contexts. Our classifier outperformed with an F1-score of 0.91, underscoring the effectiveness of our dataset. WARNING: Some of the examples included in the paper are not polite, in so far as they reveal bias that might feel discriminatory to the readers.