Goto

Collaborating Authors

 abortion


FlockVote: LLM-Empowered Agent-Based Modeling for Simulating U.S. Presidential Elections

Zhou, Lingfeng, Xu, Yi, Wang, Zhenyu, Wang, Dequan

arXiv.org Artificial Intelligence

Modeling complex human behavior, such as voter decisions in national elections, is a long-standing challenge for computational social science. Traditional agent-based models (ABMs) are limited by oversimplified rules, while large-scale statistical models often lack interpretability. We introduce FlockVote, a novel framework that uses Large Language Models (LLMs) to build a "computational laboratory" of LLM agents for political simulation. Each agent is instantiated with a high-fidelity demographic profile and dynamic contextual information (e.g. candidate policies), enabling it to perform nuanced, generative reasoning to simulate a voting decision. We deploy this framework as a testbed on the 2024 U.S. Presidential Election, focusing on seven key swing states. Our simulation's macro-level results successfully replicate the real-world outcome, demonstrating the high fidelity of our "virtual society". The primary contribution is not only the prediction, but also the framework's utility as an interpretable research tool. FlockVote moves beyond black-box outputs, allowing researchers to probe agent-level rationale and analyze the stability and sensitivity of LLM-driven social simulations.


How the Supreme Court Defines Liberty

The New Yorker

Recent memoirs by the Justices reveal how a new vision of restraint has led to radical outcomes. To understand how grudging Amy Coney Barrett's new book is when it comes to revealing personal details, consider that one of the family members the Supreme Court Justice most often refers to is a great-grandmother who died five years before she was born. On Barrett's desk at home, she recounts in " Listening to the Law," she keeps a photograph of her great-grandmother's one-story house, where, as a widow during the Great Depression, she raised some of her thirteen children and took in other needy relatives. "Looking at the photo reminds me of a woman who stretched herself beyond all reasonable capacity," Barrett explains. "I'm not sure that I'll be able to manage my life with the same grace that she had. But she motivates me to keep trying." For Barrett, the mother of seven children, that effort entails setting her alarm for 5 "Our kids get up at six thirty during the school year, so I start early if I want to accomplish anything on my own to-do list," she writes. This is what passes for disclosure from Barrett; she measures out the details of her life with coffee spoons, careful not to spill.


Value Drifts: Tracing Value Alignment During LLM Post-Training

Bhatia, Mehar, Nayak, Shravan, Kamath, Gaurav, Mosbach, Marius, Stańczak, Karolina, Shwartz, Vered, Reddy, Siva

arXiv.org Artificial Intelligence

As LLMs occupy an increasingly important role in society, they are more and more confronted with questions that require them not only to draw on their general knowledge but also to align with certain human value systems. Therefore, studying the alignment of LLMs with human values has become a crucial field of inquiry. Prior work, however, mostly focuses on evaluating the alignment of fully trained models, overlooking the training dynamics by which models learn to express human values. In this work, we investigate how and at which stage value alignment arises during the course of a model's post-training. Our analysis disentangles the effects of post-training algorithms and datasets, measuring both the magnitude and time of value drifts during training. Experimenting with Llama-3 and Qwen-3 models of different sizes and popular supervised fine-tuning (SFT) and preference optimization datasets and algorithms, we find that the SFT phase generally establishes a model's values, and subsequent preference optimization rarely re-aligns these values. Furthermore, using a synthetic preference dataset that enables controlled manipulation of values, we find that different preference optimization algorithms lead to different value alignment outcomes, even when preference data is held constant. Our findings provide actionable insights into how values are learned during post-training and help to inform data curation, as well as the selection of models and algorithms for preference optimization to improve model alignment to human values.


The Facial-Recognition Sham

The Atlantic - Technology

If you are going to promise users privacy, then you really need to follow through. Tea Dating Advice, a service that advertised itself as a safe space for women to anonymously share information about former partners--to warn others about abuse and cheating--says that it is locked down. Users are not allowed to take screenshots, and the app says it verifies that its users are women. So why did Tea let me, a middle-aged man, create an account just a few days after suffering two major security breaches? Last month, hackers wormed their way into Tea and accessed sensitive user data; 70,000 user images and more than 1 million private messages reportedly were leaked, including communications about abortions, users' driver's-license photos, and phone numbers that had been shared in private messages.


Emotionally Aware Moderation: The Potential of Emotion Monitoring in Shaping Healthier Social Media Conversations

Su, Xiaotian, Zierau, Naim, Kim, Soomin, Wang, April Yi, Wambsganss, Thiemo

arXiv.org Artificial Intelligence

Social media platforms increasingly employ proactive moderation techniques, such as detecting and curbing toxic and uncivil comments, to prevent the spread of harmful content. Despite these efforts, such approaches are often criticized for creating a climate of censorship and failing to address the underlying causes of uncivil behavior. Our work makes both theoretical and practical contributions by proposing and evaluating two types of emotion monitoring dashboards to users' emotional awareness and mitigate hate speech. In a study involving 211 participants, we evaluate the effects of the two mechanisms on user commenting behavior and emotional experiences. The results reveal that these interventions effectively increase users' awareness of their emotional states and reduce hate speech. However, our findings also indicate potential unintended effects, including increased expression of negative emotions (Angry, Fear, and Sad) when discussing sensitive issues. These insights provide a basis for further research on integrating proactive emotion regulation tools into social media platforms to foster healthier digital interactions.


Gendered Divides in Online Discussions about Reproductive Rights

Rao, Ashwin, Wang, Sze Yuh Nina, Lerman, Kristina

arXiv.org Artificial Intelligence

The U.S. Supreme Court's 2022 ruling in Dobbs v. Jackson Women's Health Organization marked a turning point in the national debate over reproductive rights. While the ideological divide over abortion is well documented, less is known about how gender and local sociopolitical contexts interact to shape public discourse. Drawing on nearly 10 million abortion-related posts on X (formerly T witter) from users with inferred gender, ideology and location, we show that gender significantly moderates abortion attitudes and emotional expression, particularly in conservative regions, and independently of ideology. This creates a gender gap in abortion attitudes that grows more pronounced in conservative regions. The leak of the Dobbs draft opinion further intensified online engagement, disproportionately mobilizing pro-abortion women in areas where access was under threat. These findings reveal that abortion discourse is not only ideologically polarized but also deeply structured by gender and place, highlighting the central the role of identity in shaping political expression during moments of institutional disruption. 1 Long a flashpoint in cultural and political battles, abortion debates have come to symbolize broader struggles over bodily autonomy, religious freedom, and gender equality. The 2022 Supreme Court ruling in Dobbs v. Jackson Women's Health Organization, which overturned nearly five decades of federal protections for abortion access established by Roe v. Wade, marked a seismic shift. It not only intensified existing partisan divides ( 1, 2), but also reshaped the legal and political terrain, triggering abrupt policy reversals in many states and catalyzing a realignment in the national debate over reproductive rights. A growing body of research has documented partisan cleavages in public attitudes toward reproductive rights ( 1, 3-7). However, less attention has been paid to the way in which gender and sociopolitical environment jointly shape both opinion formation and patterns of public expression. Recent surveys point to a widening gender gap in political orientation, particularly among younger voters. For example, in the 2024 U.S. presidential election, white men predominantly supported President Trump, while white women preferred Vice President Harris ( 8). Similarly, Gallup polling found a sharp increase in the share of young women identifying as politically liberal and supporting reproductive rights ( 9). While women consistently report higher support for abortion access, particularly in countries with less restrictive policy environments ( 10, 11), men, even those who identify as pro-choice, often show less engagement with the issue ( 11-13). Prior work has also documented gendered modes of engagement in online discourse around reproductive rights ( 1, 2).


ChatGPT is not A Man but Das Man: Representativeness and Structural Consistency of Silicon Samples Generated by Large Language Models

Li, Dai, Li, Linzhuo, Qiu, Huilian Sophie

arXiv.org Artificial Intelligence

Large language models (LLMs) in the form of chatbots like ChatGPT and Llama are increasingly proposed as "silicon samples" for simulating human opinions. This study examines this notion, arguing that LLMs may misrepresent population-level opinions. We identify two fundamental challenges: a failure in structural consistency, where response accuracy doesn't hold across demographic aggregation levels, and homogenization, an underrepresentation of minority opinions. To investigate these, we prompted ChatGPT (GPT-4) and Meta's Llama 3.1 series (8B, 70B, 405B) with questions on abortion and unauthorized immigration from the American National Election Studies (ANES) 2020. Our findings reveal significant structural inconsistencies and severe homogenization in LLM responses compared to human data. We propose an "accuracy-optimization hypothesis," suggesting homogenization stems from prioritizing modal responses. These issues challenge the validity of using LLMs, especially chatbots AI, as direct substitutes for human survey data, potentially reinforcing stereotypes and misinforming policy.


Revealing Political Bias in LLMs through Structured Multi-Agent Debate

Bandaru, Aishwarya, Bindley, Fabian, Bluth, Trevor, Chavda, Nandini, Chen, Baixu, Law, Ethan

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly used to simulate social behaviour, yet their political biases and interaction dynamics in debates remain underexplored. We investigate how LLM type and agent gender attributes influence political bias using a structured multi-agent debate framework, by engaging Neutral, Republican, and Democrat American LLM agents in debates on politically sensitive topics. We systematically vary the underlying LLMs, agent genders, and debate formats to examine how model provenance and agent personas influence political bias and attitudes throughout debates. We find that Neutral agents consistently align with Democrats, while Republicans shift closer to the Neutral; gender influences agent attitudes, with agents adapting their opinions when aware of other agents' genders; and contrary to prior research, agents with shared political affiliations can form echo chambers, exhibiting the expected intensification of attitudes as debates progress.


Artificial Intelligence health advice accuracy varies across languages and contexts

Garg, Prashant, Fetzer, Thiemo

arXiv.org Artificial Intelligence

Using basic health statements authorized by UK and EU registers and ~9,100 journalist - vetted public - health assertions on topics such as abortion, COVID - 19 and politics from sources ranging from peer - reviewed journals and government advisories to social med ia and news across the political spectrum, we benchmark six leading large language models from in 21 languages, finding that -- despite high accuracy on English - centric textbook claims -- performance falls in multiple non - European languages and fluctuates by top ic and source, highlighting the urgency of comprehensive multilingual, domain - aware validation before deploying AI in global health communication. Main Text: Recent evidence suggests that 17 % of U.S. adults -- and a striking 25 % of those aged 18 - 29 -- now consult AI chatbots for health questions at least once a month (1), while in Australia nearly 10 % of adults did so in just the first half of 2024 (2). Beyond mere curiosity, these tools can substantially improve comprehension: running standard d ischarge notes through GPT - 4 reduced the average reading grade level from 11th to 6th and boosted patient - understandability scores from 13 % to 81 % (3). Yet as fluently as large language models (LLMs) can rephrase medical text, they lack formal clinical v etting and still rely on statistical patterns in their training data. When generative AI echoes unverified or dangerous claims, it risks amplifying harm.


Polarized Online Discourse on Abortion: Frames and Hostile Expressions among Liberals and Conservatives

Rao, Ashwin, Chang, Rong-Ching, Zhong, Qiankun, Lerman, Kristina, Wojcieszak, Magdalena

arXiv.org Artificial Intelligence

Abortion has been one of the most divisive issues in the United States. Yet, missing is comprehensive longitudinal evidence on how political divides on abortion are reflected in public discourse over time, on a national scale, and in response to key events before and after the overturn of Roe v Wade. We analyze a corpus of over 3.5M tweets related to abortion over the span of one year (January 2022 to January 2023) from over 1.1M users. We estimate users' ideology and rely on state-of-the-art transformer-based classifiers to identify expressions of hostility and extract five prominent frames surrounding abortion. We use those data to examine (a) how prevalent were expressions of hostility (i.e., anger, toxic speech, insults, obscenities, and hate speech), (b) what frames liberals and conservatives used to articulate their positions on abortion, and (c) the prevalence of hostile expressions in liberals and conservative discussions of these frames. We show that liberals and conservatives largely mirrored each other's use of hostile expressions: as liberals used more hostile rhetoric, so did conservatives, especially in response to key events. In addition, the two groups used distinct frames and discussed them in vastly distinct contexts, suggesting that liberals and conservatives have differing perspectives on abortion. Lastly, frames favored by one side provoked hostile reactions from the other: liberals use more hostile expressions when addressing religion, fetal personhood, and exceptions to abortion bans, whereas conservatives use more hostile language when addressing bodily autonomy and women's health. This signals disrespect and derogation, which may further preclude understanding and exacerbate polarization.