Goto

Collaborating Authors

 portrayal


fe4b8556000d0f0cae99daa5c5c5a410-AuthorFeedback.pdf

Neural Information Processing Systems

ThismakesROARmorereliable.4 Reviewer 1 (R1) re: portrayal of human studies: R1 correctly points out our portrayal of human stud-5 ies requires more nuance. We would be glad to correct this and will update the manuscript accordingly.6 As the reviewer assumed correctly, the gap between estimators is far larger than the variance.10 But as the reviewer points out, sometimes the curve itself provides additional information.12 This44 minimum deletion area is identified by perturbing and evaluating the model output without retraining.


Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Yi, Zihao, Jiang, Qingxuan, Ma, Ruotian, Chen, Xingyu, Yang, Qu, Wang, Mengru, Ye, Fanghua, Shen, Ying, Tu, Zhaopeng, Li, Xiaolong, Linus, null

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly tasked with creative generation, including the simulation of fictional characters. However, their ability to portray non-prosocial, antagonistic personas remains largely unexamined. We hypothesize that the safety alignment of modern LLMs creates a fundamental conflict with the task of authentically role-playing morally ambiguous or villainous characters. To investigate this, we introduce the Moral RolePlay benchmark, a new dataset featuring a four-level moral alignment scale and a balanced test set for rigorous evaluation. We task state-of-the-art LLMs with role-playing characters from moral paragons to pure villains. Our large-scale evaluation reveals a consistent, monotonic decline in role-playing fidelity as character morality decreases. We find that models struggle most with traits directly antithetical to safety principles, such as ``Deceitful'' and ``Manipulative'', often substituting nuanced malevolence with superficial aggression. Furthermore, we demonstrate that general chatbot proficiency is a poor predictor of villain role-playing ability, with highly safety-aligned models performing particularly poorly. Our work provides the first systematic evidence of this critical limitation, highlighting a key tension between model safety and creative fidelity. Our benchmark and findings pave the way for developing more nuanced, context-aware alignment methods.


From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models

Cai, Zefan, Qiu, Haoyi, Zhao, Haozhe, Wan, Ke, Li, Jiachen, Gu, Jiuxiang, Xiao, Wen, Peng, Nanyun, Hu, Junjie

arXiv.org Artificial Intelligence

Recent advances in video diffusion models have significantly enhanced text-to-video generation, particularly through alignment tuning using reward models trained on human preferences. While these methods improve visual quality, they can unintentionally encode and amplify social biases. To systematically trace how such biases evolve throughout the alignment pipeline, we introduce VideoBiasEval, a comprehensive diagnostic framework for evaluating social representation in video generation. Grounded in established social bias taxonomies, VideoBiasEval employs an event-based prompting strategy to disentangle semantic content (actions and contexts) from actor attributes (gender and ethnicity). It further introduces multi-granular metrics to evaluate (1) overall ethnicity bias, (2) gender bias conditioned on ethnicity, (3) distributional shifts in social attributes across model variants, and (4) the temporal persistence of bias within videos. Using this framework, we conduct the first end-to-end analysis connecting biases in human preference datasets, their amplification in reward models, and their propagation through alignment-tuned video diffusion models. Our results reveal that alignment tuning not only strengthens representational biases but also makes them temporally stable, producing smoother yet more stereotyped portrayals. These findings highlight the need for bias-aware evaluation and mitigation throughout the alignment process to ensure fair and socially responsible video generation.


Prompting Away Stereotypes? Evaluating Bias in Text-to-Image Models for Occupations

Raza, Shaina, Powers, Maximus, Saha, Partha Pratim, Raza, Mahveen, Qureshi, Rizwan

arXiv.org Artificial Intelligence

Text-to-Image (TTI) models are powerful creative tools but risk amplifying harmful social biases. We frame representational societal bias assessment as an image curation and evaluation task and introduce a pilot benchmark of occupational portrayals spanning five socially salient roles (CEO, Nurse, Software Engineer, Teacher, Athlete). Using five state-of-the-art models: closed-source (DALLE 3, Gemini Imagen 4.0) and open-source (FLUX.1-dev, Stable Diffusion XL Turbo, Grok-2 Image), we compare neutral baseline prompts against fairness-aware controlled prompts designed to encourage demographic diversity. All outputs are annotated for gender (male, female) and race (Asian, Black, White), enabling structured distributional analysis. Results show that prompting can substantially shift demographic representations, but with highly model-specific effects: some systems diversify effectively, others overcorrect into unrealistic uniformity, and some show little responsiveness. These findings highlight both the promise and the limitations of prompting as a fairness intervention, underscoring the need for complementary model-level strategies. We release all code and data for transparency and reproducibility https://github.com/maximus-powers/img-gen-bias-analysis.



Visual Polarization Measurement Using Counterfactual Image Generation

Mosaffa, Mohammad, Rafieian, Omid, Yoganarasimhan, Hema

arXiv.org Artificial Intelligence

Political polarization is a significant issue in American politics, influencing public discourse, policy, and consumer behavior. While studies on polarization in news media have extensively focused on verbal content, non-verbal elements, particularly visual content, have received less attention due to the complexity and high dimensionality of image data. Traditional descriptive approaches often rely on feature extraction from images, leading to biased polarization estimates due to information loss. In this paper, we introduce the Polarization Measurement using Counterfactual Image Generation (PMCIG) method, which combines economic theory with generative models and multi-modal deep learning to fully utilize the richness of image data and provide a theoretically grounded measure of polarization in visual content. Applying this framework to a decade-long dataset featuring 30 prominent politicians across 20 major news outlets, we identify significant polarization in visual content, with notable variations across outlets and politicians. At the news outlet level, we observe significant heterogeneity in visual slant. Outlets such as Daily Mail, Fox News, and Newsmax tend to favor Republican politicians in their visual content, while The Washington Post, USA Today, and The New York Times exhibit a slant in favor of Democratic politicians. At the politician level, our results reveal substantial variation in polarized coverage, with Donald Trump and Barack Obama among the most polarizing figures, while Joe Manchin and Susan Collins are among the least. Finally, we conduct a series of validation tests demonstrating the consistency of our proposed measures with external measures of media slant that rely on non-image-based sources.


Entity Framing and Role Portrayal in the News

Mahmoud, Tarek, Xie, Zhuohan, Dimitrov, Dimitar, Nikolaidis, Nikolaos, Silvano, Purificação, Yangarber, Roman, Sharma, Shivam, Sartori, Elisa, Stefanovitch, Nicolas, Martino, Giovanni Da San, Piskorski, Jakub, Nakov, Preslav

arXiv.org Artificial Intelligence

We introduce a novel multilingual hierarchical corpus annotated for entity framing and role portrayal in news articles. The dataset uses a unique taxonomy inspired by storytelling elements, comprising 22 fine-grained roles, or archetypes, nested within three main categories: protagonist, antagonist, and innocent. Each archetype is carefully defined, capturing nuanced portrayals of entities such as guardian, martyr, and underdog for protagonists; tyrant, deceiver, and bigot for antagonists; and victim, scapegoat, and exploited for innocents. The dataset includes 1,378 recent news articles in five languages (Bulgarian, English, Hindi, European Portuguese, and Russian) focusing on two critical domains of global significance: the Ukraine-Russia War and Climate Change. Over 5,800 entity mentions have been annotated with role labels. This dataset serves as a valuable resource for research into role portrayal and has broader implications for news analysis. We describe the characteristics of the dataset and the annotation process, and we report evaluation results on fine-tuned state-of-the-art multilingual transformers and hierarchical zero-shot learning using LLMs at the level of a document, a paragraph, and a sentence.


Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models

Aman, Tabinda, Nadeem, Mohammad, Sohail, Shahab Saquib, Anas, Mohammad, Cambria, Erik

arXiv.org Artificial Intelligence

Generative artificial intelligence (GAI) has seen rapid adoption across diverse domains through its ability to produce high-quality text, images, and videos [1]. Vision-Language Models (VLMs) represent a significant advancement in this space, combining visual and linguistic understanding to generate contextually relevant images from textual descriptions [2]. They leverage vast datasets and sophisticated algorithms [2,3] to enable unprecedented creativity and efficiency, driving applications in marketing, entertainment, design, and more. Large Language Models (LLMs) and VLMs often inherit and perpetuate biases and stereotypes present in their training data [4-7], which is typically sourced from vast and diverse internet repositories [8-11]. The training datasets frequently contain implicit and explicit cultural stereotypes, societal biases, and skewed representations that the models learn during training.


Show, Don't Tell: Uncovering Implicit Character Portrayal using LLMs

Jaipersaud, Brandon, Zhu, Zining, Rudzicz, Frank, Creager, Elliot

arXiv.org Artificial Intelligence

Tools for analyzing character portrayal in fiction are valuable for writers and literary scholars in developing and interpreting compelling stories. Existing tools, such as visualization tools for analyzing fictional characters, primarily rely on explicit textual indicators of character attributes. However, portrayal is often implicit, revealed through actions and behaviors rather than explicit statements. We address this gap by leveraging large language models (LLMs) to uncover implicit character portrayals. We start by generating a dataset for this task with greater cross-topic similarity, lexical diversity, and narrative lengths than existing narrative text corpora such as TinyStories and WritingPrompts. We then introduce LIIPA (LLMs for Inferring Implicit Portrayal for Character Analysis), a framework for prompting LLMs to uncover character portrayals. LIIPA can be configured to use various types of intermediate computation (character attribute word lists, chain-of-thought) to infer how fictional characters are portrayed in the source text. We find that LIIPA outperforms existing approaches, and is more robust to increasing character counts (number of unique persons depicted) due to its ability to utilize full narrative context. Lastly, we investigate the sensitivity of portrayal estimates to character demographics, identifying a fairness-accuracy tradeoff among methods in our LIIPA framework -- a phenomenon familiar within the algorithmic fairness literature. Despite this tradeoff, all LIIPA variants consistently outperform non-LLM baselines in both fairness and accuracy. Our work demonstrates the potential benefits of using LLMs to analyze complex characters and to better understand how implicit portrayal biases may manifest in narrative texts.


She Works, He Works: A Curious Exploration of Gender Bias in AI-Generated Imagery

Foka, Amalia

arXiv.org Artificial Intelligence

The representation of gender within visual culture has been a fertile ground for critical inquiry, particularly within feminist scholarship. Griselda Pollock's seminal work, Vision and Difference (1988) [2], established a foundational framework for understanding how visual representations of women in art are not merely aesthetic choices, but are deeply intertwined with societal power dynamics and gender ideologies. Pollock's analysis demonstrates how these r epresentations often function as "signs" that reinforce traditional gender roles and limit female agency, inspiring generations of scholars to scrutinize the ways visual culture shapes our understanding of gender and other social identities. This theoretic al framework provides a critical lens through which to examine potential biases in AI -generated art and its impact on contemporary representations of gender. Following Pollock's groundbreaking work, feminist scholarship in visual culture has con nued to evolve and expand.