SureMap: Simultaneous mean estimation for single-task and multi-task disaggregated evaluation

Neural Information Processing Systems 

Disaggregated evaluation--estimation of performance of a machine learning model on different subpopulations--is a core task when assessing performance and group-fairness of AI systems.A key challenge is that evaluation data is scarce, and subpopulations arising from intersections of attributes (e.g., race, sex, age) are often tiny.Today, it is common for multiple clients to procure the same AI model from a model developer, and the task of disaggregated evaluation is faced by each customer individually. This gives rise to what we call the, wherein multiple clients seek to conduct a disaggregated evaluation of a given model in their own data setting (task).