Goto

Collaborating Authors

 backstory


Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

Kang, Minwoo, Moon, Suhong, Lee, Seung Hyeong, Raj, Ayush, Suh, Joseph, Chan, David M., Canny, John

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly capable of simulating human behavior, offering cost-effective ways to estimate user responses to various surveys and polls. However, the questions in these surveys usually reflect socially understood attitudes: the patterns of attitudes of old/young, liberal/conservative, as understood by both members and non-members of those groups. It is not clear whether the LLM binding is \emph{deep}, meaning the LLM answers as a member of a particular in-group would, or \emph{shallow}, meaning the LLM responds as an out-group member believes an in-group member would. To explore this difference, we use questions that expose known in-group/out-group biases. This level of fidelity is critical for applying LLMs to various political science studies, including timely topics on polarization dynamics, inter-group conflict, and democratic backsliding. To this end, we propose a novel methodology for constructing virtual personas with synthetic user "backstories" generated as extended, multi-turn interview transcripts. This approach is justified by the theory of \emph{narrative identity} which argues that personality at the highest level is \emph{constructed} from self-narratives. Our generated backstories are longer, rich in detail, and consistent in authentically describing a singular individual, compared to previous methods. We show that virtual personas conditioned on our backstories closely replicate human response distributions (up to an 87% improvement as measured by Wasserstein Distance) and produce effect sizes that closely match those observed in the original studies of in-group/out-group biases. Altogether, our work extends the applicability of LLMs beyond estimating socially understood responses, enabling their use in a broader range of human studies.


Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication

Shen, Jocelyn, Yerukola, Akhila, Zhou, Xuhui, Breazeal, Cynthia, Sap, Maarten, Park, Hae Won

arXiv.org Artificial Intelligence

Conversational breakdowns in close relationships are deeply shaped by personal histories and emotional context, yet most NLP research treats conflict detection as a general task, overlooking the relational dynamics that influence how messages are perceived. In this work, we leverage nonviolent communication (NVC) theory to evaluate LLMs in detecting conversational breakdowns and assessing how relationship backstory influences both human and model perception of conflicts. Given the sensitivity and scarcity of real-world datasets featuring conflict between familiar social partners with rich personal backstories, we contribute the PersonaConflicts Corpus, a dataset of N=5,772 naturalistic simulated dialogues spanning diverse conflict scenarios between friends, family members, and romantic partners. Through a controlled human study, we annotate a subset of dialogues and obtain fine-grained labels of communication breakdown types on individual turns, and assess the impact of backstory on human and model perception of conflict in conversation. We find that the polarity of relationship backstories significantly shifted human perception of communication breakdowns and impressions of the social partners, yet models struggle to meaningfully leverage those backstories in the detection task. Additionally, we find that models consistently overestimate how positively a message will make a listener feel. Our findings underscore the critical role of personalization to relationship contexts in enabling LLMs to serve as effective mediators in human communication for authentic connection.


Shaping Event Backstories to Estimate Potential Emotion Contexts

Schäfer, Johannes, Klinger, Roman

arXiv.org Artificial Intelligence

Emotion analysis is an inherently ambiguous task. Previous work studied annotator properties to explain disagreement, but this overlooks the possibility that ambiguity may stem from missing information about the context of events. In this paper, we propose a novel approach that adds reasonable contexts to event descriptions, which may better explain a particular situation. Our goal is to understand whether these enriched contexts enable human annotators to annotate emotions more reliably. We disambiguate a target event description by automatically generating multiple event chains conditioned on differing emotions. By combining techniques from short story generation in various settings, we achieve coherent narratives that result in a specialized dataset for the first comprehensive and systematic examination of contextualized emotion analysis. Through automatic and human evaluation, we find that contextual narratives enhance the interpretation of specific emotions and support annotators in producing more consistent annotations.


SYNTHIA: Synthetic Yet Naturally Tailored Human-Inspired PersonAs

Rahimzadeh, Vahid, Monazzah, Erfan Moosavi, Pilehvar, Mohammad Taher, Yaghoobzadeh, Yadollah

arXiv.org Artificial Intelligence

Persona-driven LLMs have emerged as powerful tools in computational social science, yet existing approaches fall at opposite extremes, either relying on costly human-curated data or producing synthetic personas that lack consistency and realism. We introduce SYNTHIA, a dataset of 30,000 backstories derived from 10,000 real social media users from BlueSky open platform across three time windows, bridging this spectrum by grounding synthetic generation in authentic user activity. Our evaluation demonstrates that SYNTHIA achieves competitive performance with state-of-the-art methods in demographic diversity and social survey alignment while significantly outperforming them in narrative consistency. Uniquely, SYNTHIA incorporates temporal dimensionality and provides rich social interaction metadata from the underlying network, enabling new research directions in computational social science and persona-driven language modeling.


ValueSim: Generating Backstories to Model Individual Value Systems

Du, Bangde, Ye, Ziyi, Wu, Zhijing, Monika, Jankowska, Zhu, Shuqi, Ai, Qingyao, Zhou, Yujia, Liu, Yiqun

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) continue to exhibit increasingly human-like capabilities, aligning them with human values has become critically important. Contemporary advanced techniques, such as prompt learning and reinforcement learning, are being deployed to better align LLMs with human values. However, while these approaches address broad ethical considerations and helpfulness, they rarely focus on simulating individualized human value systems. To address this gap, we present ValueSim, a framework that simulates individual values through the generation of personal backstories reflecting past experiences and demographic information. ValueSim converts structured individual data into narrative backstories and employs a multi-module architecture inspired by the Cognitive-Affective Personality System to simulate individual values based on these narratives. Testing ValueSim on a self-constructed benchmark derived from the World Values Survey demonstrates an improvement in top-1 accuracy by over 10% compared to retrieval-augmented generation methods. Further analysis reveals that performance enhances as additional user interaction history becomes available, indicating the model's ability to refine its persona simulation capabilities over time.


This 'College Protester' Isn't Real. It's an AI-Powered Undercover Bot for Cops

WIRED

American police departments near the United States-Mexico border are paying hundreds of thousands of dollars for an unproven and secretive technology that uses AI-generated online personas designed to interact with and collect intelligence on "college protesters," "radicalized" political activists, and suspected drug and human traffickers, according to internal documents, contracts, and communications that 404 Media obtained via public records requests. This article is copublished in partnership with 404 Media. Massive Blue, the New York–based company that is selling police departments this technology, calls its product Overwatch, which it markets as an "AI-powered force multiplier for public safety" that "deploys lifelike virtual agents, which infiltrate and engage criminal networks across various channels." According to a presentation obtained by 404 Media, Massive Blue is offering cops these virtual personas that can be deployed across the internet with the express purpose of interacting with suspects over text messages and social media. The technology--which as of last summer had not led to any known arrests--demonstrates the types of social media monitoring and undercover tools private companies are pitching to police and border agents.


REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems

Geng, Longling, Chang, Edward Y.

arXiv.org Artificial Intelligence

This benchmark suite provides a comprehensive evaluation framework for assessing both individual LLMs and multi-agent systems in real-world planning scenarios. The suite encompasses eleven designed problems that progress from basic to highly complex, incorporating key aspects such as multi-agent coordination, inter-agent dependencies, and dynamic environmental disruptions. Each problem can be scaled along three dimensions: the number of parallel planning threads, the complexity of inter-dependencies, and the frequency of unexpected disruptions requiring real-time adaptation. The benchmark includes detailed specifications, evaluation metrics, and baseline implementations using contemporary frameworks like LangGraph, enabling rigorous testing of both single-agent and multi-agent planning capabilities. Through standardized evaluation criteria and scalable complexity, this benchmark aims to drive progress in developing more robust and adaptable AI planning systems for real-world applications.


Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions

Yun, Taedong, Yang, Eric, Safdari, Mustafa, Lee, Jong Ha, Kumar, Vaishnavi Vinod, Mahdavi, S. Sara, Amar, Jonathan, Peyton, Derek, Aharony, Reut, Michaelides, Andreas, Schneider, Logan, Galatzer-Levy, Isaac, Jia, Yugang, Canny, John, Gretton, Arthur, Matarić, Maja

arXiv.org Artificial Intelligence

We present an end-to-end framework for generating synthetic users for evaluating interactive agents designed to encourage positive behavior changes, such as in health and lifestyle coaching. The synthetic users are grounded in health and lifestyle conditions, specifically sleep and diabetes management in this study, to ensure realistic interactions with the health coaching agent. Synthetic users are created in two stages: first, structured data are generated grounded in real-world health and lifestyle factors in addition to basic demographics and behavioral attributes; second, full profiles of the synthetic users are developed conditioned on the structured data. Interactions between synthetic users and the coaching agent are simulated using generative agent-based models such as Concordia, or directly by prompting a language model. Using two independently-developed agents for sleep and diabetes coaching as case studies, the validity of this framework is demonstrated by analyzing the coaching agent's understanding of the synthetic users' needs and challenges. Finally, through multiple blinded evaluations of user-coach interactions by human experts, we demonstrate that our synthetic users with health and behavioral attributes more accurately portray real human users with the same attributes, compared to generic synthetic users not grounded in such attributes. The proposed framework lays the foundation for efficient development of conversational agents through extensive, realistic, and grounded simulated interactions.


Goodbye cartoon breasts, hello sweat stains: the feminist reinvention of Tomb Raider

The Guardian

Hot on the heels of that Oasis reunion comes news of the return of another 90s icon – Lara Croft. She bounds back on to our screens with a new animated series, still sporting that holy triumvirate of classic ponytail, backpack and combat boots. From the get-go she's performing seemingly impossible feats in the name of archaeology: she outswims a ravenous crocodile, and uses her signature blend of parkour and gymnastics to avoid a pit of sharp spikes. But this isn't the Tomb Raider star quite as you might remember her. The eponymous star of Netflix's Tomb Raider: The Legend of Lara Croft – voiced by Agent Carter's Hayley Atwell – looks different to how she appeared in the original games.


Virtual Personas for Language Models via an Anthology of Backstories

Moon, Suhong, Abdulhai, Marwa, Kang, Minwoo, Suh, Joseph, Soedarmadji, Widyadewi, Behar, Eran Kohen, Chan, David M.

arXiv.org Artificial Intelligence

Large language models (LLMs) are trained from vast repositories of text authored by millions of distinct authors, reflecting an enormous diversity of human traits. While these models bear the potential to be used as approximations of human subjects in behavioral studies, prior efforts have been limited in steering model responses to match individual human users. In this work, we introduce "Anthology", a method for conditioning LLMs to particular virtual personas by harnessing open-ended life narratives, which we refer to as "backstories." We show that our methodology enhances the consistency and reliability of experimental outcomes while ensuring better representation of diverse sub-populations. Across three nationally representative human surveys conducted as part of Pew Research Center's American Trends Panel (ATP), we demonstrate that Anthology achieves up to 18% improvement in matching the response distributions of human respondents and 27% improvement in consistency metrics. Our code and generated backstories are available at https://github.com/CannyLab/anthology.