Goto

Collaborating Authors

 dall-e



Evaluating and comparing gender bias across four text-to-image models

Hammad, Zoya, Sowah, Nii Longdon

arXiv.org Artificial Intelligence

SUMMARY As we increasingly use Artificial Intelligence (AI) in decision-making for industries like healthcare, finance, e-commerce, and even entertainment, it is crucial to also reflect on the ethical aspects of AI, for example the inclusivity and fairness of the information it provides. In this work, we aimed to evaluate different text-to-image AI models and compare the degree of gender bias they present. The evaluated models were Stable Diffusion XL (SDXL), Stable Diffusion Cascade (SC), DALL-E and Emu. We hypothesized that DALL-E and Stable Diffusion, which are comparatively older models, would exhibit a noticeable degree of gender bias towards men, while Emu, which was recently released by Meta AI, would have more balanced results. As hypothesized, we found that both Stable Diffusion models exhibit a noticeable degree of gender bias while Emu demonstrated more balanced results (i.e less gender bias). However, interestingly, Open AI's DALL-E exhibited almost opposite results, such that the ratio of women to men was significantly higher in most cases tested. Here, although we still observed a bias, the bias favored females over males. This bias may be explained by the fact that OpenAI changed the prompts at its backend, as observed during our experiment. We also observed that Emu from Meta AI utilized user information while generating images via WhatsApp. We also proposed some potential solutions to avoid such biases, including ensuring diversity across AI research teams and having diverse datasets. INTRODUCTION Artificial Intelligence (AI) has been growing remarkably in recent years, impacting numerous aspects of our daily lives. One such area of significant advancement is text-to-image generation.



From job titles to jawlines: Using context voids to study generative AI systems

Memon, Shahan Ali, De, Soham, Kang, Sungha, Mujtaba, Riyan, AlShebli, Bedoor, Davis, Katie, Snyder, Jaime, West, Jevin D.

arXiv.org Artificial Intelligence

In this paper, we introduce a speculative design methodology for studying the behavior of generative AI systems, framing design as a mode of inquiry. We propose bridging seemingly unrelated domains to generate intentional context voids, using these tasks as probes to elicit AI model behavior. We demonstrate this through a case study: probing the ChatGPT system (GPT-4 and DALL-E) to generate headshots from professional Curricula Vitae (CVs). In contrast to traditional ways, our approach assesses system behavior under conditions of radical uncertainty -- when forced to invent entire swaths of missing context -- revealing subtle stereotypes and value-laden assumptions. We qualitatively analyze how the system interprets identity and competence markers from CVs, translating them into visual portraits despite the missing context (i.e. physical descriptors). We show that within this context void, the AI system generates biased representations, potentially relying on stereotypical associations or blatant hallucinations.


OpenAI releases impressive 4o image generator for free and paid users

PCWorld

Earlier this week, OpenAI released their "most advanced image generator yet" and made it available through ChatGPT using the GPT-4o model. ChatGPT previously relied on Dall-E to generate images. According to OpenAI, the improved 4o model is able to produce precise, accurate, and photorealistic results. They claim that it's also particularly good at rendering text, following instructions precisely, and even understanding the context of a chat. All of this includes the transformation of uploaded images or using uploaded images as visual inspiration.


Effect of Gender Fair Job Description on Generative AI Images

Böckling, Finn, Marquenie, Jan, Siegert, Ingo

arXiv.org Artificial Intelligence

STEM fields are traditionally male-dominated, with gender biases shaping perceptions of job accessibility. This study analyzed gender representation in STEM occupation images generated by OpenAI DALL-E 3 \& Black Forest FLUX.1 using 150 prompts in three linguistic forms: German generic masculine, German pair form, and English. As control, 20 pictures of social occupations were generated as well. Results revealed significant male bias across all forms, with the German pair form showing reduced bias but still overrepresenting men for the STEM-Group and mixed results for the Group of Social Occupations. These findings highlight generative AI's role in reinforcing societal biases, emphasizing the need for further discussion on diversity (in AI). Further aspects analyzed are age-distribution and ethnic diversity.


Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models

Aman, Tabinda, Nadeem, Mohammad, Sohail, Shahab Saquib, Anas, Mohammad, Cambria, Erik

arXiv.org Artificial Intelligence

Generative artificial intelligence (GAI) has seen rapid adoption across diverse domains through its ability to produce high-quality text, images, and videos [1]. Vision-Language Models (VLMs) represent a significant advancement in this space, combining visual and linguistic understanding to generate contextually relevant images from textual descriptions [2]. They leverage vast datasets and sophisticated algorithms [2,3] to enable unprecedented creativity and efficiency, driving applications in marketing, entertainment, design, and more. Large Language Models (LLMs) and VLMs often inherit and perpetuate biases and stereotypes present in their training data [4-7], which is typically sourced from vast and diverse internet repositories [8-11]. The training datasets frequently contain implicit and explicit cultural stereotypes, societal biases, and skewed representations that the models learn during training.


Surrealistic-like Image Generation with Vision-Language Models

Ayten, Elif, Wang, Shuai, Snoep, Hjalmar

arXiv.org Artificial Intelligence

Recent advances in generative AI make it convenient to create different types of content, including text, images, and code. In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models, including DALL-E, Deep Dream Generator, and DreamStudio. Our investigation starts with the generation of images under various image generation settings and different models. The primary objective is to identify the most suitable model and settings for producing such images. Additionally, we aim to understand the impact of using edited base images on the generated resulting images. Through these experiments, we evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images. Our analysis shows that Dall-E 2 performs the best when using the generated prompt by ChatGPT.


OpenAI Poaches 3 Top Engineers From DeepMind

WIRED

OpenAI announced today it has hired three senior computer vision and machine learning engineers from rival Google DeepMind, all of whom will work in a newly opened OpenAI office in Zurich, Switzerland. OpenAI executives told staff in an internal memo on Tuesday that Lucas Beyer, Alexander Kolesnikov, and Xiaohua Zhai will be joining the company to work on multimodal AI, artificial intelligence models capable of performing tasks in different mediums ranging from images to audio. OpenAI has long been at the forefront of multimodal AI and released the first version of its text-to-image platform Dall-E in 2021. Its flagship chatbot ChatGPT, however, was initially only capable of interacting with text inputs. The company later added voice and image features as multimodal functionality became an increasingly important part of its product line and AI research.


Why A.I. Isn't Going to Make Art

The New Yorker

In 1953, Roald Dahl published "The Great Automatic Grammatizator," a short story about an electrical engineer who secretly desires to be a writer. One day, after completing construction of the world's fastest calculating machine, the engineer realizes that "English grammar is governed by rules that are almost mathematical in their strictness." He constructs a fiction-writing machine that can produce a five-thousand-word short story in thirty seconds; a novel takes fifteen minutes and requires the operator to manipulate handles and foot pedals, as if he were driving a car or playing an organ, to regulate the levels of humor and pathos. The resulting novels are so popular that, within a year, half the fiction published in English is a product of the engineer's invention. Is there anything about art that makes us think it can't be created by pushing a button, as in Dahl's imagination?