Retiring Adult: New Datasets for Fair Machine Learning
–Neural Information Processing Systems
Although the fairness community has recognized the importance of data, re-searchers in the area primarily rely on UCIAdult when it comes to tabular data. Derived from a 1994 USCensus survey, this dataset has appeared in hundreds of research papers where it served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available USCensus sources and reveal idiosyncrasies of the UCIAdult dataset that limit its external validity. Our primary contribution is asuite of new datasets derived from USCensus surveys that extend the existing data ecosystem for research on fair machine learning. We create prediction tasks relating to income, employment, health, transportation, and housing. The data span multiple years and all states of the United States, allowing researchers to studytemporal shift and geographic variation. We highlight a broad initial sweep of new empirical insights relating to trade-offs between fairness criteria, performance of algorithmic interventions, and the role of distribution shift based on our new datasets. Our findings inform ongoing debates, challenge some existing narratives, and point to future research directions.
Neural Information Processing Systems
Apr-25-2026, 10:10:23 GMT
- Country:
- North America > United States (1.00)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Law (1.00)
- Information Technology (0.68)
- Government > Regional Government
- Technology: