SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata
Díaz, Mark, Dev, Sunipa, Reif, Emily, Denton, Emily, Prabhakaran, Vinodkumar
–arXiv.org Artificial Intelligence
The unstructured nature of data used in foundation model development is a challenge to systematic analyses for making data use and documentation decisions. From a Responsible AI perspective, these decisions often rely upon understanding how people are represented in data. We propose a framework designed to guide analysis of human representation in unstructured data and identify downstream risks. We apply the framework in two toy examples using the Common Crawl web text corpus (C4) and LAION-400M. We also propose a set of hypothetical action steps in service of dataset use, development, and documentation.
arXiv.org Artificial Intelligence
Dec-1-2023
- Country:
- North America > United States (0.46)
- Genre:
- Research Report (0.82)
- Industry:
- Health & Medicine (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (0.68)
- Machine Learning > Neural Networks (0.67)
- Natural Language (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence