Goto

Collaborating Authors

 unlabeled data and bayesian inference


Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

Neural Information Processing Systems

Group fairness is measured via parity of quantitative metrics across different protected demographic groups. In this paper, we investigate the problem of reliably assessing group fairness metrics when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores (for unlabeled examples) of each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions for an array of group fairness metrics with a notion of uncertainty. We demonstrate that our approach leads to significant and consistent reductions in estimation error across multiple well-known fairness datasets, sensitive attributes, and predictive models. The results clearly show the benefits of using both unlabeled data and Bayesian inference in assessing whether a prediction model is fair or not.


Review for NeurIPS paper: Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

Neural Information Processing Systems

Additional Feedback: This paper proposes to use Bayesian estimates of fairness metrics. It combines this with Bayesian calibration models (one for each protected attribute value in this particular case) in order to use unlabelled data. In light of existing work (Foulds et al 2019) on Bayesian modelling of fairness, the contribution is rather minor and is limited to the case where we have unlabelled data. The approach the authors use, as it is based on calibration, seems limited to rather specific notions of fairness where Bayesian calibration can be usefully applied. Although in l.64 the definition of calibration is correct, in l. 105-107 you write that s_j P_M(y_j 1 s_j) . Since j is a specific example, there should not be any randomness here.


Review for NeurIPS paper: Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

Neural Information Processing Systems

This paper focuses on the problem of leveraging unlabelled data to generate better estimates of fairness metrics given limited labelled data. All three reviewers agree that the manuscript makes a valuable contribution and is conceptually and mathematically sound. The significance of the contribution (an auditor tool only, instead of an auditor plus a mitigation tool) is however at the low side.


Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

Neural Information Processing Systems

Group fairness is measured via parity of quantitative metrics across different protected demographic groups. In this paper, we investigate the problem of reliably assessing group fairness metrics when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores (for unlabeled examples) of each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions for an array of group fairness metrics with a notion of uncertainty.