AITopics | disaggregated evaluation

Collaborating Authors

disaggregated evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SureMap: Simultaneous mean estimation for single-task and multi-task disaggregated evaluation

Neural Information Processing SystemsMar-19-2026, 04:40:48 GMT

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.97)

Add feedback

265d0413fa2efa41084f58b52a4a2a7f-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 21:15:46 GMT

estimator, evaluation, suremap, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
North America > Puerto Rico (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(3 more...)

Add feedback

Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness

Pfohl, Stephen R., Harris, Natalie, Nagpal, Chirag, Madras, David, Mhasawade, Vishwali, Salaudeen, Olawale, Dieng, Awa, Sequeira, Shannon, Arciniegas, Santiago, Sung, Lillian, Ezeanochie, Nnamdi, Cole-Lewis, Heather, Heller, Katherine, Koyejo, Sanmi, D'Amour, Alexander

arXiv.org Machine LearningJun-5-2025

Disaggregated evaluation across subgroups is critical for assessing the fairness of machine learning models, but its uncritical use can mislead practitioners. We show that equal performance across subgroups is an unreliable measure of fairness when data are representative of the relevant populations but reflective of real-world disparities. Furthermore, when data are not representative due to selection bias, both disaggregated evaluation and alternative approaches based on conditional independence testing may be invalid without explicit assumptions regarding the bias mechanism. We use causal graphical models to predict metric stability across subgroups under different data generating processes. Our framework suggests complementing disaggregated evaluations with explicit causal assumptions and analysis to control for confounding and distribution shift, including conditional independence testing and weighted performance estimation. These findings have broad implications for how practitioners design and interpret model assessments given the ubiquity of disaggregated evaluation.

data mining, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2506.04193

Country:

North America > United States > California > Santa Clara County > Mountain View (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(8 more...)

Genre: Research Report > New Finding (0.30)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

SureMap: Simultaneous mean estimation for single-task and multi-task disaggregated evaluation

Neural Information Processing SystemsMay-26-2025, 19:09:11 GMT

Disaggregated evaluation--estimation of performance of a machine learning model on different subpopulations--is a core task when assessing performance and group-fairness of AI systems.A key challenge is that evaluation data is scarce, and subpopulations arising from intersections of attributes (e.g., race, sex, age) are often tiny.Today, it is common for multiple clients to procure the same AI model from a model developer, and the task of disaggregated evaluation is faced by each customer individually. This gives rise to what we call the multi-task disaggregated evaluation problem, wherein multiple clients seek to conduct a disaggregated evaluation of a given model in their own data setting (task). In this work we develop a disaggregated evaluation method called SureMap that has high estimation accuracy for both multi-task and single-task disaggregated evaluations of blackbox models. SureMap's efficiency gains come from(1) transforming the problem into structured simultaneous Gaussian mean estimation and (2) incorporating external data, e.g., from the AI system creator or from their other clients. Our method combines maximum a posteriori (MAP) estimation using a well-chosen prior together with cross-validation-free tuning via Stein's unbiased risk estimate (SURE).We evaluate SureMap on disaggregated evaluation tasks in multiple domains, observing significant accuracy improvements over several strong competitors.

artificial intelligence, disaggregated evaluation, machine learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation

Khodak, Mikhail, Mackey, Lester, Chouldechova, Alexandra, Dudík, Miroslav

arXiv.org Machine LearningNov-14-2024

Disaggregated evaluation -- estimation of performance of a machine learning model on different subpopulations -- is a core task when assessing performance and group-fairness of AI systems. A key challenge is that evaluation data is scarce, and subpopulations arising from intersections of attributes (e.g., race, sex, age) are often tiny. Today, it is common for multiple clients to procure the same AI model from a model developer, and the task of disaggregated evaluation is faced by each customer individually. This gives rise to what we call the multi-task disaggregated evaluation problem, wherein multiple clients seek to conduct a disaggregated evaluation of a given model in their own data setting (task). In this work we develop a disaggregated evaluation method called SureMap that has high estimation accuracy for both multi-task and single-task disaggregated evaluations of blackbox models. SureMap's efficiency gains come from (1) transforming the problem into structured simultaneous Gaussian mean estimation and (2) incorporating external data, e.g., from the AI system creator or from their other clients. Our method combines maximum a posteriori (MAP) estimation using a well-chosen prior together with cross-validation-free tuning via Stein's unbiased risk estimate (SURE). We evaluate SureMap on disaggregated evaluation tasks in multiple domains, observing significant accuracy improvements over several strong competitors.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2411.0973

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
North America > Puerto Rico (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(2 more...)

Add feedback

A structured regression approach for evaluating model performance across intersectional subgroups

Herlihy, Christine, Truong, Kimberly, Chouldechova, Alexandra, Dudik, Miroslav

arXiv.org Artificial IntelligenceJan-26-2024

Disaggregated evaluation is a central task in AI fairness assessment, with the goal to measure an AI system's performance across different subgroups defined by combinations of demographic or other sensitive attributes. The standard approach is to stratify the evaluation data across subgroups and compute performance metrics separately for each group. However, even for moderately-sized evaluation datasets, sample sizes quickly get small once considering intersectional subgroups, which greatly limits the extent to which intersectional groups are considered in many disaggregated evaluations. In this work, we introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups. We also provide corresponding inference strategies for constructing confidence intervals and explore how goodness-of-fit testing can yield insight into the structure of fairness-related harms experienced by intersectional groups. We evaluate our approach on two publicly available datasets, and several variants of semi-synthetic data. The results show that our method is considerably more accurate than the standard approach, especially for small subgroups, and goodness-of-fit testing helps identify the key factors that drive differences in performance.

confidence interval, dataset, subgroup, (15 more...)

arXiv.org Artificial Intelligence

2401.14893

Country:

North America > United States > Oregon (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area (0.94)

Add feedback

Assessing the Fairness of AI Systems: AI Practitioners' Processes, Challenges, and Needs for Support

Madaio, Michael, Egede, Lisa, Subramonyam, Hariharan, Vaughan, Jennifer Wortman, Wallach, Hanna

arXiv.org Artificial IntelligenceDec-10-2021

Various tools and practices have been developed to support practitioners in identifying, assessing, and mitigating fairness-related harms caused by AI systems. However, prior research has highlighted gaps between the intended design of these tools and practices and their use within particular contexts, including gaps caused by the role that organizational factors play in shaping fairness work. In this paper, we investigate these gaps for one such practice: disaggregated evaluations of AI systems, intended to uncover performance disparities between demographic groups. By conducting semi-structured interviews and structured workshops with thirty-three AI practitioners from ten teams at three technology companies, we identify practitioners' processes, challenges, and needs for support when designing disaggregated evaluations. We find that practitioners face challenges when choosing performance metrics, identifying the most relevant direct stakeholders and demographic groups on which to focus, and collecting datasets with which to conduct disaggregated evaluations. More generally, we identify impacts on fairness work stemming from a lack of engagement with direct stakeholders, business imperatives that prioritize customers over marginalized groups, and the drive to deploy AI systems at scale.

ai system, disaggregated evaluation, participant, (10 more...)

arXiv.org Artificial Intelligence

2112.05675

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Applied AI (0.86)
Information Technology > Communications > Social Media (0.68)

Add feedback

Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs

Barocas, Solon, Guo, Anhong, Kamar, Ece, Krones, Jacquelyn, Morris, Meredith Ringel, Vaughan, Jennifer Wortman, Wadsworth, Duncan, Wallach, Hanna

arXiv.org Artificial IntelligenceMar-10-2021

Several pieces of work have uncovered performance disparities by conducting "disaggregated evaluations" of AI systems. We build on these efforts by focusing on the choices that must be made when designing a disaggregated evaluation, as well as some of the key considerations that underlie these design choices and the tradeoffs between these considerations. We argue that a deeper understanding of the choices, considerations, and tradeoffs involved in designing disaggregated evaluations will better enable researchers, practitioners, and the public to understand the ways in which AI systems may be underperforming for particular groups of people.

disaggregated evaluation, evaluation, performance disparity, (16 more...)

arXiv.org Artificial Intelligence

2103.06076

Country:

Oceania > Australia (0.28)
North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Applied AI (0.70)

Add feedback