Measuring Data
Mitchell, Margaret, Luccioni, Alexandra Sasha, Lambert, Nathan, Gerchick, Marissa, McMillan-Major, Angelina, Ozoani, Ezinwanne, Rajani, Nazneen, Thrush, Tristan, Jernite, Yacine, Kiela, Douwe
–arXiv.org Artificial Intelligence
We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets. Similar to an object's height, width, and volume, data measurements quantify different attributes of data along common dimensions that support comparison. Several lines of research have proposed what we refer to as measurements, with differing terminology; we bring some of this work together, particularly in fields of computer vision and language, and build from it to motivate measuring data as a critical component of responsible AI development. Measuring data aids in systematically building and analyzing machine learning (ML) data towards specific goals and gaining better control of what modern ML systems will learn. We conclude with a discussion of the many avenues of future work, the limitations of data measurements, and how to leverage these measurement approaches in research and practice.
arXiv.org Artificial Intelligence
Feb-13-2023
- Country:
- North America
- United States
- Michigan (0.04)
- New York > New York County
- New York City (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- California
- Santa Clara County > Palo Alto (0.04)
- Alameda County > Berkeley (0.04)
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Greater Manchester > Manchester (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Netherlands > South Holland
- Leiden (0.04)
- Italy > Piedmont
- Turin Province > Turin (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Germany > Baden-Württemberg
- Tübingen Region > Tübingen (0.04)
- United Kingdom > England
- North America
- Genre:
- Overview (0.46)
- Research Report (0.40)
- Industry:
- Technology: