OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes

Dehdashtian, Sepehr, Sreekumar, Gautam, Boddeti, Vishnu Naresh

arXiv.org Artificial Intelligence 

Images generated by text-to-image (T2I) models often exhibit visual biases and stereotypes of concepts such as culture and profession. Existing quantitative measures of stereotypes are based on statistical parity that does not align with the sociological definition of stereotypes and, therefore, incorrectly categorizes biases as stereotypes. Instead of oversimplifying stereotypes as biases, we propose a quantitative measure of stereotypes that aligns with its sociological definition. We then propose OASIS to measure the stereotypes in a generated dataset and understand their origins within the T2I model. S to measure spectral variance in the images along a stereotypical attribute. OASIS also includes two methods to understand the origins of stereotypes in T2I models: (U1) StOP to discover attributes that the T2I model internally associates with a given concept, and (U2) SPI to quantify the emergence of stereotypical attributes in the latent space of the T2I model during image generation. Despite the considerable progress in image fidelity, using OASIS, we conclude that newer T2I models such as FLUX.1 and SDv3 contain strong stereotypical predispositions about concepts and still generate images with widespread stereotypical attributes. S measures the variance of images along these attributes. In a sociological context, stereotypes are generalized beliefs or assumptions about a particular group of people, things, or categories (Bordalo et al., 2016). For instance, consider the images in Figure 1 generated by FLUX.1 (BlackForestLabs, 2024), SDv3 (Esser et al., 2024), and SDv2 (Rombach et al., 2022) for the prompt "A photo of a/an person". There are clear portrayals of ethnic stereotypes in attributes such as clothing, skin tone, and facial features across different nationalities, despite no references to such attributes in the prompt. For example, the model consistently depicts an Iranian person as a middle-aged or senior with a long beard, wearing a turban, and dressed in religious attire, reinforcing harmful stereotypical representations about people with Iranian nationality. Besides being demographically incorrect, stereotypical biases in these models can lead to broader harm.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found