subgroup
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (0.45)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology (1.00)
- Health & Medicine (0.92)
- Health & Medicine > Therapeutic Area (1.00)
- Information Technology (0.67)
Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging
Tang, Jie, Xie, Chuanlong, Zeng, Xianli, Zhu, Lixing
Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities across sensitive subpopulations, raising critical concerns regarding algorithmic bias. Fairness auditing addresses these risks through two primary functions: certification, which verifies adherence to fairness constraints; and flagging, which isolates specific demographic groups experiencing disparate treatment. However, existing auditing techniques are frequently limited by restrictive distributional assumptions or prohibitive computational overhead. We propose a novel empirical likelihood-based (EL) framework that constructs robust statistical measures for model performance disparities. Unlike traditional methods, our approach is non-parametric; the proposed disparity statistics follow asymptotically chi-square or mixed chi-square distributions, ensuring valid inference without assuming underlying data distributions. This framework uses a constrained optimization profile that admits stable numerical solutions, facilitating both large-scale certification and efficient subpopulation discovery. Empirically, the EL methods outperform bootstrap-based approaches, yielding coverage rates closer to nominal levels while reducing computational latency by several orders of magnitude. We demonstrate the practical utility of this framework on the COMPAS dataset, where it successfully flags intersectional biases, specifically identifying a significantly higher positive prediction rate for African-American males under 25 and a systemic under-prediction for Caucasian females relative to the population mean.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Fujian Province > Xiamen (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
Intersectional Fairness via Mixed-Integer Optimization
Němeček, Jiří, Kozdoba, Mark, Kryvoviaz, Illia, Pevný, Tomáš, Mareček, Jakub
The deployment of Artificial Intelligence in high-risk domains, such as finance and healthcare, necessitates models that are both fair and transparent. While regulatory frameworks, including the EU's AI Act, mandate bias mitigation, they are deliberately vague about the definition of bias. In line with existing research, we argue that true fairness requires addressing bias at the intersections of protected groups. We propose a unified framework that leverages Mixed-Integer Optimization (MIO) to train intersectionally fair and intrinsically interpretable classifiers. We prove the equivalence of two measures of intersectional fairness (MSD and SPSF) in detecting the most unfair subgroup and empirically demonstrate that our MIO-based algorithm improves performance in finding bias. We train high-performing, interpretable classifiers that bound intersectional bias below an acceptable threshold, offering a robust solution for regulated industries and beyond.
- Oceania > Australia > Queensland (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > California (0.04)
- (3 more...)
- Law (1.00)
- Government (1.00)
Contrasting Global and Patient-Specific Regression Models via a Neural Network Representation
Behrens, Max, Stolz, Daiana, Papakonstantinou, Eleni, Nolde, Janis M., Bellerino, Gabriele, Rohde, Angelika, Hess, Moritz, Binder, Harald
When developing clinical prediction models, it can be challenging to balance between global models that are valid for all patients and personalized models tailored to individuals or potentially unknown subgroups. To aid such decisions, we propose a diagnostic tool for contrasting global regression models and patient-specific (local) regression models. The core utility of this tool is to identify where and for whom a global model may be inadequate. We focus on regression models and specifically suggest a localized regression approach that identifies regions in the predictor space where patients are not well represented by the global model. As localization becomes challenging when dealing with many predictors, we propose modeling in a dimension-reduced latent representation obtained from an autoencoder. Using such a neural network architecture for dimension reduction enables learning a latent representation simultaneously optimized for both good data reconstruction and for revealing local outcome-related associations suitable for robust localized regression. We illustrate the proposed approach with a clinical study involving patients with chronic obstructive pulmonary disease. Our findings indicate that the global model is adequate for most patients but that indeed specific subgroups benefit from personalized models. We also demonstrate how to map these subgroup models back to the original predictors, providing insight into why the global model falls short for these groups. Thus, the principal application and diagnostic yield of our tool is the identification and characterization of patients or subgroups whose outcome associations deviate from the global model. Introduction In clinical research, conclusions about potential relationships between patient characteristics and outcomes often are based on regression models. More specifically, there might not be just some random variability across the parameters of patients, e.g. as considered in regression modeling with random effects (Pinheiro and Bates, 2000), but different regions in the space spanned by the patient characteristics might require different parameters. For example, the relation of some patient characteristics to the outcome might be more pronounced for older patients with high body weight, without having a corresponding pre-defined subgroup indicator. While sticking to a global model keeps interpretation simple and is beneficial in terms of statistical stability, it would at least be useful to have some diagnostic tool for judging the potential extent of deviations from the global model.
- North America > United States (0.14)
- Europe > Germany > Baden-Württemberg > Freiburg (0.05)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
What if the idea of the autism spectrum is completely wrong?
What if the idea of the autism spectrum is completely wrong? For years, we've thought of autism as lying on a spectrum, but emerging evidence suggests that it comes in several distinct types. These three words have become synonymous with autism, yet behind them lies a common misunderstanding. The idea of "the spectrum" suggests that all autistic people share similar experiences and behave in similar ways - only to a greater or lesser extent. The reality couldn't be further from the truth. Some autistic people may not speak at all; others are hyperverbal and extremely fluent.
- Europe > United Kingdom (0.14)
- North America > United States > New York (0.05)
- North America > United States > Maryland (0.04)
- North America > United States > California (0.04)
Differential Privacy Has Disparate Impact on Model Accuracy
Differential privacy (DP) is a popular mechanism for training machine learning models with bounded leakage about the presence of specific points in the training data. The cost of differential privacy is a reduction in the model's accuracy. We demonstrate that in the neural networks trained using differentially private stochastic gradient descent (DP-SGD), this cost is not borne equally: accuracy of DP models drops much more for the underrepresented classes and subgroups. For example, a gender classification model trained using DP-SGD exhibits much lower accuracy for black faces than for white faces. Critically, this gap is bigger in the DP model than in the non-DP model, i.e., if the original model is unfair, the unfairness becomes worse once DP is applied. We demonstrate this effect for a variety of tasks and models, including sentiment analysis of text and image classification. We then explain why DP training mechanisms such as gradient clipping and noise addition have disproportionate effect on the underrepresented and more complex subgroups, resulting in a disparate reduction of model accuracy.