Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data

Rolf, Esther, Worledge, Theodora, Recht, Benjamin, Jordan, Michael I.

Mar-4-2021–arXiv.org Machine Learning

Datasets play a critical role in shaping the perception of performance and progress in machine learning (ML)--the way we collect, process, and analyze data affects the way we benchmark success and form new research agendas (Paullada et al., 2020; Dotan & Milli, 2020). A growing appreciation of this determinative role of datasets has sparked a concomitant concern that standard datasets used for training and evaluating ML models lack diversity along significant dimensions, for example, geography, gender, and skin type (Shankar et al., 2017; Buolamwini & Gebru, 2018). Lack of diversity in evaluation data can obfuscate disparate performance when evaluating based on aggregate accuracy (Buolamwini & Gebru, 2018). Lack of diversity in training data can limit the extent to which learned models can adequately apply to all portions of a population, a concern highlighted in recent work in the medical domain (Habib et al., 2019; Hofmanninger et al., 2020). Our work aims to develop a general unifying perspective on the way that dataset composition affects outcomes of machine learning systems.

computer based training, dataset, educational technology, (23 more...)

arXiv.org Machine Learning

Mar-4-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States > California (0.14)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Education
  - Educational Setting > Online (0.94)
  - Educational Technology > Educational Software
    - Computer Based Training (0.69)
- Health & Medicine > Therapeutic Area (0.68)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Inductive Learning (0.68)
    - Statistical Learning > Regression (0.46)
  - Enterprise Applications > Human Resources
    - Learning Management (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found