Goto

Collaborating Authors

 Simulation of Human Behavior


Do Language Models Agree with Human Perceptions of Suspense in Stories?

arXiv.org Artificial Intelligence

Suspense is an affective response to narrative text that is believed to involve complex cognitive processes in humans. Several psychological models have been developed to describe this phenomenon and the circumstances under which text might trigger it. We replicate four seminal psychological studies of human perceptions of suspense, substituting human responses with those of different open-weight and closed-source LMs. We conclude that while LMs can distinguish whether a text is intended to induce suspense in people, LMs cannot accurately estimate the relative amount of suspense within a text sequence as compared to human judgments, nor can LMs properly capture the human perception for the rise and fall of suspense across multiple text segments. We probe the abilities of LM suspense understanding by adversarially permuting the story text to identify what cause human and LM perceptions of suspense to diverge. We conclude that, while LMs can superficially identify and track certain facets of suspense, they do not process suspense in the same way as human readers.


Organ-Agents: Virtual Human Physiology Simulator via LLMs

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have enabled new possibilities in simulating complex physiological systems. We introduce Organ-Agents, a multi-agent framework that simulates human physiology via LLM-driven agents. Each Simulator models a specific system (e.g., cardiovascular, renal, immune). Training consists of supervised fine-tuning on system-specific time-series data, followed by reinforcement-guided coordination using dynamic reference selection and error correction. We curated data from 7,134 sepsis patients and 7,895 controls, generating high-resolution trajectories across 9 systems and 125 variables. Organ-Agents achieved high simulation accuracy on 4,509 held-out patients, with per-system MSEs <0.16 and robustness across SOFA-based severity strata. External validation on 22,689 ICU patients from two hospitals showed moderate degradation under distribution shifts with stable simulation. Organ-Agents faithfully reproduces critical multi-system events (e.g., hypotension, hyperlactatemia, hypoxemia) with coherent timing and phase progression. Evaluation by 15 critical care physicians confirmed realism and physiological plausibility (mean Likert ratings 3.9 and 3.7). Organ-Agents also enables counterfactual simulations under alternative sepsis treatment strategies, generating trajectories and APACHE II scores aligned with matched real-world patients. In downstream early warning tasks, classifiers trained on synthetic data showed minimal AUROC drops (<0.04), indicating preserved decision-relevant patterns. These results position Organ-Agents as a credible, interpretable, and generalizable digital twin for precision diagnosis, treatment simulation, and hypothesis testing in critical care.


Chinese 'Virtual Human' Salespeople Are Outperforming Their Real Human Counterparts

WIRED

The salesperson hawking Brother printers on Taobao works hard--like, really hard. At any time of the day, even when there's no audience on the Chinese ecommerce platform, the same woman wearing a white shirt and black skirt is always livestreaming, boasting about the various features of different office printers. She has a phone in one hand and often checks it as if to read a sales script or monitor the viewer comments coming in. "My friends, I've gotta plug this game-changing office tool that can double your workplace efficiency, " the salesperson said during one recent broadcast, trying to achieve the delicate balance between friendliness and precision that has come to define the billion-dollar livestream ecommerce industry in China. Occasionally, she greeted the invisible audience.


Helen Oyeyemi's Novel of Cognitive Dissonance

The New Yorker

Few fantasies are harder to wipe away than the romance of a clean slate. Every January, when we're twitchy with regret and self-loathing, advertisers blare, "New Year, new you," urging us to jettison our failures and start fresh. In fiction, self-reinvention is a perennial theme, often shadowed by the suspicion that it can't be done. Lately, novelists have put a political spin on the idea, counterposing hopeful acts of individual self-fashioning to the immovable weight of circumstance. Halle Butler's "The New Me" (2019), a millennial office satire, finds its temp heroine, Millie, trying to life-hack her way out of loneliness and professional drift--buy a plant, whiten her teeth, make friends, think positive.


AI That Helps Us Help Each Other: A Proactive System for Scaffolding Mentor-Novice Collaboration in Entrepreneurship Coaching

arXiv.org Artificial Intelligence

Entrepreneurship requires navigating open-ended, ill-defined problems: identifying risks, challenging assumptions, and making strategic decisions under deep uncertainty. Novice founders often struggle with these metacognitive demands, while mentors face limited time and visibility to provide tailored support. We present a human-AI coaching system that combines a domain-specific cognitive model of entrepreneurial risk with a large language model (LLM) to proactively scaffold both novice and mentor thinking. The system proactively poses diagnostic questions that challenge novices' thinking and helps both novices and mentors plan for more focused and emotionally attuned meetings. Critically, mentors can inspect and modify the underlying cognitive model, shaping the logic of the system to reflect their evolving needs. Through an exploratory field deployment, we found that using the system supported novice metacognition, helped mentors plan emotionally attuned strategies, and improved meeting depth, intentionality, and focus--while also surfaced key tensions around trust, misdiagnosis, and expectations of AI. We contribute design principles for proactive AI systems that scaffold metacognition and human-human collaboration in complex, ill-defined domains, offering implications for similar domains like healthcare, education, and knowledge work.


Understanding Human Limits in Pattern Recognition: A Computational Model of Sequential Reasoning in Rock, Paper, Scissors

arXiv.org Artificial Intelligence

How do we predict others from patterns in their behavior and what are the computational constraints that limit this ability? We investigate these questions by modeling human behavior over repeated games of rock, paper, scissors from Brockbank & Vul (2024). Against algorithmic opponents that varied in strategic sophistication, people readily exploit simple transition patterns (e.g., consistently playing rock after paper) but struggle to detect more complex sequential dependencies. To understand the cognitive mechanisms underlying these abilities and their limitations, we deploy Hypothetical Minds (HM), a large language model-based agent that generates and tests hypotheses about opponent strategies, as a cognitive model of this behavior (Cross et al., 2024). We show that when applied to the same experimental conditions, HM closely mirrors human performance patterns, succeeding and failing in similar ways. To better understand the source of HM's failures and whether people might face similar cognitive bottlenecks in this context, we performed a series of ablations and augmentations targeting different components of the system. When provided with natural language descriptions of the opponents' strategies, HM successfully exploited 6/7 bot opponents with win rates >80% suggesting that accurate hypothesis generation is the primary cognitive bottleneck in this task. Further, by systematically manipulating the model's hypotheses through pedagogically-inspired interventions, we find that the model substantially updates its causal understanding of opponent behavior, revealing how model-based analyses can produce testable hypotheses about human cognition.


Transferring Expert Cognitive Models to Social Robots via Agentic Concept Bottleneck Models

arXiv.org Artificial Intelligence

Successful group meetings, such as those implemented in group behavioral-change programs, work meetings, and other social contexts, must promote individual goal setting and execution while strengthening the social relationships within the group. Consequently, an ideal facilitator must be sensitive to the subtle dynamics of disengagement, difficulties with individual goal setting and execution, and interpersonal difficulties that signal a need for intervention. The challenges and cognitive load experienced by facilitators create a critical gap for an embodied technology that can interpret social exchanges while remaining aware of the needs of the individuals in the group and providing transparent recommendations that go beyond powerful but "black box" foundation models (FMs) that identify social cues. We address this important demand with a social robot co-facilitator that analyzes multimodal meeting data and provides discreet cues to the facilitator. The robot's reasoning is powered by an agentic concept bottleneck model (CBM), which makes decisions based on human-interpretable concepts like participant engagement and sentiments, ensuring transparency and trustworthiness. Our core contribution is a transfer learning framework that distills the broad social understanding of an FM into our specialized and transparent CBM. This concept-driven system significantly outperforms direct zero-shot FMs in predicting the need for intervention and enables real-time human correction of its reasoning. Critically, we demonstrate robust knowledge transfer: the model generalizes across different groups and successfully transfers the expertise of senior human facilitators to improve the performance of novices. By transferring an expert's cognitive model into an interpretable robotic partner, our work provides a powerful blueprint for augmenting human capabilities in complex social domains.


Six Guidelines for Trustworthy, Ethical and Responsible Automation Design

arXiv.org Artificial Intelligence

Calibrated trust in automated systems (Lee and See 2004) is critical for their safe and seamless integration into society. Users should only rely on a system recommendation when it is actually correct and reject it when it is factually wrong. One requirement to achieve this goal is an accurate trustworthiness assessment, ensuring that the user's perception of the system's trustworthiness aligns with its actual trustworthiness, allowing users to make informed decisions about the extent to which they can rely on the system (Schlicker et al. 2022). We propose six design guidelines to help designers optimize for accurate trustworthiness assessments, thus fostering ethical and responsible human-automation interactions. The proposed guidelines are derived from existing literature in various fields, such as human-computer interaction, cognitive psychology, automation research, user-experience design, and ethics. We are incorporating key principles from the field of pragmatics, specifically the cultivation of common ground (H. H. Clark 1996) and Gricean communication maxims (Grice 1975). These principles are essential for the design of automated systems because the user's perception of the system's trustworthiness is shaped by both environmental contexts, such as organizational culture or societal norms, and by situational context, including the specific circumstances or scenarios in which the interaction occurs (Hoff and Bashir 2015). Our proposed guidelines provide actionable insights for designers to create automated systems that make relevant trustworthiness cues available. This would ideally foster calibrated trust and more satisfactory, productive, and safe interactions between humans and automated systems. Furthermore, the proposed heuristics might work as a tool for evaluating to what extent existing systems enable users to accurately assess a system's trustworthiness.


Opacity as Authority: Arbitrariness and the Preclusion of Contestation

arXiv.org Artificial Intelligence

This article redefines arbitrariness not as a normative flaw or a symptom of domination, but as a foundational functional mechanism structuring human systems and interactions. Diverging from critical traditions that conflate arbitrariness with injustice, it posits arbitrariness as a semiotic trait: a property enabling systems - linguistic, legal, or social - to operate effectively while withholding their internal rationale. Building on Ferdinand de Saussure's concept of l'arbitraire du signe, the analysis extends this principle beyond language to demonstrate its cross-domain applicability, particularly in law and social dynamics. The paper introduces the "Motivation -> Constatability -> Contestability" chain, arguing that motivation functions as a crucial interface rendering an act's logic vulnerable to intersubjective contestation. When this chain is broken through mechanisms like "immotivization" or "Conflict Lateralization" (exemplified by "the blur of the wolf drowned in the fish"), acts produce binding effects without exposing their rationale, thus precluding justiciability. This structural opacity, while appearing illogical, is a deliberate design protecting authority from accountability. Drawing on Shannon's entropy model, the paper formalizes arbitrariness as A = H(L|M) (conditional entropy). It thereby proposes a modern theory of arbitrariness as a neutral operator central to control as well as care, an overlooked dimension of interpersonal relations. While primarily developed through human social systems, this framework also illuminates a new pathway for analyzing explainability in advanced artificial intelligence systems.


The Incomplete Bridge: How AI Research (Mis)Engages with Psychology

arXiv.org Artificial Intelligence

Social sciences have accumulated a rich body of theories and methodologies for investigating the human mind and behaviors, while offering valuable insights into the design and understanding of Artificial Intelligence (AI) systems. Focusing on psychology as a prominent case, this study explores the interdisciplinary synergy between AI and the field by analyzing 1,006 LLM-related papers published in premier AI venues between 2023 and 2025, along with the 2,544 psychology publications they cite. Through our analysis, we identify key patterns of interdisciplinary integration, locate the psychology domains most frequently referenced, and highlight areas that remain underexplored. We further examine how psychology theories/frameworks are operationalized and interpreted, identify common types of misapplication, and offer guidance for more effective incorporation. Our work provides a comprehensive map of interdisciplinary engagement between AI and psychology, thereby facilitating deeper collaboration and advancing AI systems.