SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata
Díaz, Mark, Dev, Sunipa, Reif, Emily, Denton, Emily, Prabhakaran, Vinodkumar
–arXiv.org Artificial Intelligence
The unstructured nature of data used in foundation model development is a challenge to systematic analyses for making data use and documentation decisions. From a Responsible AI perspective, these decisions often rely upon understanding how people are represented in data. We propose a framework designed to guide analysis of human representation in unstructured data and identify downstream risks. We apply the framework in two toy examples using the Common Crawl web text corpus (C4) and LAION-400M. We also propose a set of hypothetical action steps in service of dataset use, development, and documentation.
arXiv.org Artificial Intelligence
Dec-1-2023
- Country:
- Asia
- India (0.04)
- Middle East > Jordan (0.04)
- Philippines (0.04)
- Europe > Italy
- North America
- Central America (0.04)
- Dominican Republic (0.04)
- United States > New York
- New York County > New York City (0.04)
- South America (0.04)
- Asia
- Genre:
- Research Report (0.82)
- Industry:
- Health & Medicine (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (0.68)
- Machine Learning > Neural Networks (0.67)
- Natural Language (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence