Qadri, Rida
The Case for "Thick Evaluations" of Cultural Representation in AI
Qadri, Rida, Diaz, Mark, Wang, Ding, Madaio, Michael
To a ddress these gaps, prior work has sought to evaluate the cultural representations within AI generated output, b ut with few exceptions [30, 67], mostly through quantified, metricized approaches to representation such as statistical similarities and benchmark-style scoring [49, 84]. However, the use of these methods presumes that representation is an o bjective construct with an empirical, definitive ground truth that outputs can be compared against [e.g., 42, 84] [fo r a critique of ground truth, see 59]. Given limitations of these computational methods, evaluation of representation is reduced to basic recognition or factual generation of artifacts. Even when human feedback on representation is sought, it is solicited through narrow, constrained, quantitative scales from anonymized crowdworkers who often do not have th e lived experiences to evaluate nuances of cultural representation of other cultures. However, this approach to measuring representation is in contravention to decades of scholarship in the social sciences that emphasizes the subjective nature of representation, where judgments about representation in visual media are constructed in conversation with the viewer's lived experiences and the broader context within which an image is Permission to make digital or hard copies of all or part of thi s work for personal or classroom use is granted without fee pr ovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
Risks of Cultural Erasure in Large Language Models
Qadri, Rida, Davani, Aida M., Robinson, Kevin, Prabhakaran, Vinodkumar
Large language models are increasingly being integrated into applications that shape the production and discovery of societal knowledge such as search, online education, and travel planning. As a result, language models will shape how people learn about, perceive and interact with global cultures making it important to consider whose knowledge systems and perspectives are represented in models. Recognizing this importance, increasingly work in Machine Learning and NLP has focused on evaluating gaps in global cultural representational distribution within outputs. However, more work is needed on developing benchmarks for cross-cultural impacts of language models that stem from a nuanced sociologically-aware conceptualization of cultural impact or harm. We join this line of work arguing for the need of metricizable evaluations of language technologies that interrogate and account for historical power inequities and differential impacts of representation on global cultures, particularly for cultures already under-represented in the digital corpora. We look at two concepts of erasure: omission: where cultures are not represented at all and simplification i.e. when cultural complexity is erased by presenting one-dimensional views of a rich culture. The former focuses on whether something is represented, and the latter on how it is represented. We focus our analysis on two task contexts with the potential to influence global cultural production. First, we probe representations that a language model produces about different places around the world when asked to describe these contexts. Second, we analyze the cultures represented in the travel recommendations produced by a set of language model applications. Our study shows ways in which the NLP community and application developers can begin to operationalize complex socio-cultural considerations into standard evaluations and benchmarks.
Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity
Qadri, Rida, Mirowski, Piotr, Gabriellan, Aroussiak, Mehr, Farbod, Gupta, Huma, Karimi, Pamela, Denton, Remi
This paper proposes dialogue as a method for evaluating generative AI tools for culturally-situated creative practice, that recognizes the socially situated nature of art. Drawing on sociologist Howard Becker's concept of Art Worlds, this method expands the scope of traditional AI and creativity evaluations beyond benchmarks, user studies with crowd-workers, or focus groups conducted with artists. Our method involves two mutually informed dialogues: 1) 'dialogues with art worlds' placing artists in conversation with experts such as art historians, curators, and archivists, and 2)'dialogues with the machine,' facilitated through structured artist- and critic-led experimentation with state-of-the-art generative AI tools. We demonstrate the value of this method through a case study with artists and experts steeped in non-western art worlds, specifically the Persian Gulf. We trace how these dialogues help create culturally rich and situated forms of evaluation for representational possibilities of generative AI that mimic the reception of generative artwork in the broader art ecosystem. Putting artists in conversation with commentators also allow artists to shift their use of the tools to respond to their cultural and creative context. Our study can provide generative AI researchers an understanding of the complex dynamics of technology, human creativity and the socio-politics of art worlds, to build more inclusive machines for diverse art worlds.
Beyond the Surface: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation
Jha, Akshita, Prabhakaran, Vinodkumar, Denton, Remi, Laszlo, Sarah, Dave, Shachi, Qadri, Rida, Reddy, Chandan K., Dev, Sunipa
Recent studies have highlighted the issue of stereotypical depictions for people of different identity groups in Text-to-Image (T2I) model generations. However, these existing approaches have several key limitations, including a noticeable lack of coverage of global identity groups in their evaluation, and the range of their associated stereotypes. Additionally, they often lack a critical distinction between inherently visual stereotypes, such as `underweight' or `sombrero', and culturally dependent stereotypes like `attractive' or `terrorist'. In this work, we address these limitations with a multifaceted approach that leverages existing textual resources to ground our evaluation of geo-cultural stereotypes in the generated images from T2I models. We employ existing stereotype benchmarks to identify and evaluate visual stereotypes at a global scale, spanning 135 nationality-based identity groups. We demonstrate that stereotypical attributes are thrice as likely to be present in images of these identities as compared to other attributes. We further investigate how disparately offensive the depictions of generated images are for different nationalities. Finally, through a detailed case study, we reveal how the 'default' representations of all identity groups have a stereotypical appearance. Moreover, for the Global South, images across different attributes are visually similar, even when explicitly prompted otherwise. CONTENT WARNING: Some examples may contain offensive stereotypes.
AI's Regimes of Representation: A Community-centered Study of Text-to-Image Models in South Asia
Qadri, Rida, Shelby, Renee, Bennett, Cynthia L., Denton, Emily
This paper presents a community-centered study of cultural limitations of text-to-image (T2I) models in the South Asian context. We theorize these failures using scholarship on dominant media regimes of representations and locate them within participants' reporting of their existing social marginalizations. We thus show how generative AI can reproduce an outsiders gaze for viewing South Asian cultures, shaped by global and regional power inequities. By centering communities as experts and soliciting their perspectives on T2I limitations, our study adds rich nuance into existing evaluative frameworks and deepens our understanding of the culturally-specific ways AI technologies can fail in non-Western and Global South settings. We distill lessons for responsible development of T2I models, recommending concrete pathways forward that can allow for recognition of structural inequalities.
Cultural Incongruencies in Artificial Intelligence
Prabhakaran, Vinodkumar, Qadri, Rida, Hutchinson, Ben
Artificial intelligence (AI) systems attempt to imitate human behavior. How well they do this imitation is often used to assess their utility and to attribute human-like (or artificial) intelligence to them. However, most work on AI refers to and relies on human intelligence without accounting for the fact that human behavior is inherently shaped by the cultural contexts they are embedded in, the values and beliefs they hold, and the social practices they follow. Additionally, since AI technologies are mostly conceived and developed in just a handful of countries, they embed the cultural values and practices of these countries. Similarly, the data that is used to train the models also fails to equitably represent global cultural diversity. Problems therefore arise when these technologies interact with globally diverse societies and cultures, with different values and interpretive practices. In this position paper, we describe a set of cultural dependencies and incongruencies in the context of AI-based language and vision technologies, and reflect on the possibilities of and potential strategies towards addressing these incongruencies.