SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata
Díaz, Mark, Dev, Sunipa, Reif, Emily, Denton, Emily, Prabhakaran, Vinodkumar
–arXiv.org Artificial Intelligence
The unstructured nature of data used in foundation model development is a challenge to systematic analyses for making data use and documentation decisions. From a Responsible AI perspective, these decisions often rely upon understanding how people are represented in data. We propose a framework designed to guide analysis of human representation in unstructured data and identify downstream risks. We apply the framework in two toy examples using the Common Crawl web text corpus (C4) and LAION-400M. We also propose a set of hypothetical action steps in service of dataset use, development, and documentation.
arXiv.org Artificial Intelligence
Dec-1-2023
- Country:
- South America (0.04)
- North America
- Dominican Republic (0.04)
- Central America (0.04)
- United States > New York
- New York County > New York City (0.04)
- Europe > Italy
- Asia
- Philippines (0.04)
- Middle East > Jordan (0.04)
- India (0.04)
- Genre:
- Research Report (0.82)
- Industry:
- Health & Medicine (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language (1.00)
- Issues > Social & Ethical Issues (0.68)
- Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence