Goto

Collaborating Authors

 Agents


BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have enabled social simulation through multi-agent systems. Prior efforts focus on agent societies created from scratch, assigning agents with newly defined personas. However, simulating established fictional worlds and characters remain largely underexplored, despite its significant practical value. In this paper, we introduce BookWorld, a comprehensive system for constructing and simulating book-based multi-agent societies. BookWorld's design covers comprehensive real-world intricacies, including diverse and dynamic characters, fictional worldviews, geographical constraints and changes, e.t.c. BookWorld enables diverse applications including story generation, interactive games and social simulation, offering novel ways to extend and explore beloved fictional works. Through extensive experiments, we demonstrate that BookWorld generates creative, high-quality stories while maintaining fidelity to the source books, surpassing previous methods with a win rate of 75.36%. The code of this paper can be found at the project page: https://bookworld2025.github.io/.


Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey

arXiv.org Artificial Intelligence

--This survey explores the development of meta-thinking capabilities in Large Language Models (LLMs) from a Multi-Agent Reinforcement Learning (MARL) perspective. The survey begins by analyzing current LLM limitations, such as hallucinations and the lack of internal self-assessment mechanisms. It then talks about newer methods, including RL from human feedback (RLHF), self-distillation, and chain-of-thought prompting, and each of their limitations. The crux of the survey is to talk about how multi-agent architectures, namely supervisor-agent hierarchies, agent debates, and theory of mind frameworks, can emulate human-like introspective behavior and enhance LLM robustness. By exploring reward mechanisms, self-play, and continuous learning methods in MARL, this survey gives a comprehensive roadmap to building introspective, adaptive, and trustworthy LLMs. Evaluation metrics, datasets, and future research avenues, including neuroscience-inspired architectures and hybrid symbolic reasoning, are also discussed. THE cognitive abilities, such as intelligence and creativity, have played a fundamental role in human discoveries and inventions. Understanding the relationship between these two cognitive abilities is important not only for the advancement of psychological theories but also for the improvement of educational practices [1]. However, researchers still hold different views on how intelligence and creativity interact, often leading to conflicting findings. A key question in this discourse is how intelligence enables structured problem-solving, while creativity fosters novel solutions that are essential for human cognition and artificial intelligence systems. Ahsan Bilal is with University of Oklahoma, Norman, OK, 73072, USA (e-mail: ahsan.bilal-1@ou.edu). Muhammad Ahmed Mohsin, Muhammad Umer are with Stanford University, Stanford, CA, 94305, USA (e-mail: muahmed, mumer@stanford.edu). Muhammad A wais Khan Bangash is with the School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK, 74075 USA (e-mail: awais.bangash@okstate.edu). Muhammad Ali Jamshed is with University of Glasgow, G12 8QQ, Glasgow, UK (e-mail: muhammadali.jamshed@glasgow.ac.uk). Similarly, in problem-solving tasks, intelligence aids in analyzing constraints, while creativity allows for flexible and unconventional approaches. Moreover, the role of internal thought processes varies with task complexity. Simpler tasks require minimal reasoning, whereas more complex tasks demand deeper cognitive engagement. This principle extends to artificial intelligence, where more sophisticated models exhibit enhanced performance in tasks requiring higher-order thinking.


DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue

arXiv.org Artificial Intelligence

Speech synthesis is crucial for human-computer interaction, enabling natural and intuitive communication. However, existing datasets involve high construction costs due to manual annotation and suffer from limited character diversity, contextual scenarios, and emotional expressiveness. To address these issues, we propose DialogueAgents, a novel hybrid agent-based speech synthesis framework, which integrates three specialized agents -- a script writer, a speech synthesizer, and a dialogue critic -- to collaboratively generate dialogues. Grounded in a diverse character pool, the framework iteratively refines dialogue scripts and synthesizes speech based on speech review, boosting emotional expressiveness and paralinguistic features of the synthesized dialogues. Using DialogueAgent, we contribute MultiTalk, a bilingual, multi-party, multi-turn speech dialogue dataset covering diverse topics. Extensive experiments demonstrate the effectiveness of our framework and the high quality of the MultiTalk dataset. We release the dataset and code https://github.com/uirlx/DialogueAgents to facilitate future research on advanced speech synthesis models and customized data generation.


Optimizing SIA Development: A Case Study in User-Centered Design for Estuary, a Multimodal Socially Interactive Agent Framework

arXiv.org Artificial Intelligence

This case study presents our user-centered design model for Socially Intelligent Agent (SIA) development frameworks through our experience developing Estuary, an open source multimodal framework for building low-latency real-time socially interactive agents. We leverage the Rapid Assessment Process (RAP) to collect the thoughts of leading researchers in the field of SIAs regarding the current state of the art for SIA development as well as their evaluation of how well Estuary may potentially address current research gaps. We achieve this through a series of end-user interviews conducted by a fellow researcher in the community. We hope that the findings of our work will not only assist the continued development of Estuary but also guide the development of other future frameworks and technologies for SIAs.


Birds of a Different Feather Flock Together: Exploring Opportunities and Challenges in Animal-Human-Machine Teaming

arXiv.org Artificial Intelligence

Birds of a Different Feather Flock Together: Exploring Opportunities and Challenges in Animal-Human-Machine Teaming Myke C. Cohen 1,2, David A. Grimm 3, Reuth Mirsky 4, and Xiaoyun Yin 1 1 Arizona State University, Mesa, AZ 2 Aptima, Inc., Woburn, MA 3 Georgia Institute of Technology, Atlanta, GA 4 Tufts University, Medford, MA Abstract Animal-Human-Machine (AHM) teams are a type of hybrid intelligence system wherein interactions between a human, AI-enabled machine, and animal members can result in unique capabilities greater than the sum of their parts. This paper calls for a systematic approach to studying the design of AHM team structures to optimize performance and overcome limitations in various applied settings. We consider the challenges and opportunities in investigating the synergistic potential of AHM team members by introducing a set of dimensions of AHM team functioning to effectively utilize each member's strengths while compensating for individual weaknesses. Using three representative examples of such teams--security screening, search-and-rescue, and guide dogs--the paper illustrates how AHM teams can tackle complex tasks. We conclude with open research directions that this multidimensional approach presents for studying hybrid human-AI systems beyond AHM teams. Keywords: multi-agent systems, animal-human-machine teaming, functional allocation 1 Introduction Consider a Blind or Visually Impaired (BVI) person training to be assisted by a guide dog. When the pair reaches an obstacle along their path, they 1 arXiv:2504.13973v1


Designing Empathetic Companions: Exploring Personality, Emotion, and Trust in Social Robots

arXiv.org Artificial Intelligence

Designing Empathetic Companions: Exploring Personality, Emotion, and Trust in Social Robots Alice Nardelli* 1, Antonio Sgorbissa 1, Carmine Tommaso Recchiuto 1 Abstract -- How should a companion robot behave? In this research, we present a cognitive architecture based on a tailored personality model to investigate the impact of robotic personalities on the perception of companion robots. Drawing from existing literature, we identified empathy, trust, and enjoyability as key factors in building companionship with social robots. Based on these insights, we implemented a personality-dependent, emotion-aware generator, recognizing the crucial role of robot emotions in shaping these elements. We then conducted a user study involving 84 dyadic conversation sessions with the emotional robot Navel, which exhibited different personalities. Results were derived from a multimodal analysis, including questionnaires, open-ended responses, and behavioral observations. This approach allowed us to validate the developed emotion generator and explore the relationship between the personality traits of Agreeableness, Extraversion, Conscientiousness, and Empathy. Furthermore, we drew robust conclusions on how these traits influence relational trust, capability trust, enjoyability, and sociability.


Evaluation and Incident Prevention in an Enterprise AI Assistant

arXiv.org Artificial Intelligence

Enterprise AI Assistants are increasingly deployed in domains where accuracy is paramount, making each erroneous output a potentially significant incident. This paper presents a comprehensive framework for monitoring, benchmarking, and continuously improving such complex, multi-component systems under active development by multiple teams. Our approach encompasses three key elements: (1) a hierarchical ``severity'' framework for incident detection that identifies and categorizes errors while attributing component-specific error rates, facilitating targeted improvements; (2) a scalable and principled methodology for benchmark construction, evaluation, and deployment, designed to accommodate multiple development teams, mitigate overfitting risks, and assess the downstream impact of system modifications; and (3) a continual improvement strategy leveraging multidimensional evaluation, enabling the identification and implementation of diverse enhancement opportunities. By adopting this holistic framework, organizations can systematically enhance the reliability and performance of their AI Assistants, ensuring their efficacy in critical enterprise environments. We conclude by discussing how this multifaceted evaluation approach opens avenues for various classes of enhancements, paving the way for more robust and trustworthy AI systems.


The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning

arXiv.org Artificial Intelligence

Our work aims to advance the social reasoning of embodied artificial intelligence (AI) agents in real-world social interactions. Recently, language models (LMs) and foundational models (FMs) are being utilized as automatic evaluators of human-AI interactions with the goal of eventually being used to improve the policy of the AI agent. To enable further research in this direction, we introduce a large-scale real-world Human Robot Social Interaction (HSRI) Dataset to benchmark the capabilities of LMs and FMs to identify and reason about social interactions, specifically with regard to robot social errors and competencies . Our dataset consists of 400 real-world human social robot interaction videos and over 10K annotations, detailing the robot's social errors, competencies, rationale, and corrective actions, capturing unique aspects of human-AI interaction only present in real-world interactions. To further assess AI models' ability to reason about social interactions, we propose eight new benchmark tasks for evaluating centered around whether AI models can (1) evaluate social interactions via detecting social errors and competencies, (2) identify the explanatory factors associated to errors and competencies, (3) understand the flow of real-world social interactions, and (4) provide reasons and corrective actions for social errors. Human studies and experiments with modern LMs and FMs reveal that current models struggle with these tasks, demonstrating that our dataset and benchmark provides a step forward towards socially intelligent AI.


Human aversion? Do AI Agents Judge Identity More Harshly Than Performance

arXiv.org Artificial Intelligence

This study examines the understudied role of algorithmic evaluation of human judgment in hybrid decision-making systems, a critical gap in management research. While extant literature focuses on human reluctance to follow algorithmic advice, we reverse the perspective by investigating how AI agents based on large language models (LLMs) assess and integrate human input. Our work addresses a pressing managerial constraint: firms barred from deploying LLMs directly due to privacy concerns can still leverage them as mediating tools (for instance, anonymized outputs or decision pipelines) to guide high-stakes choices like pricing or discounts without exposing proprietary data. Through a controlled prediction task, we analyze how an LLM-based AI agent weights human versus algorithmic predictions. We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors--a bias exacerbated when the agent's identity (human vs AI) is disclosed and the human is positioned second. These results reveal a disconnect between AI-generated trust metrics and the actual influence of human judgment, challenging assumptions about equitable human-AI collaboration. Our findings offer three key contributions. First, we identify a reverse algorithm aversion phenomenon, where AI agents undervalue human input despite comparable error rates. Second, we demonstrate how disclosure and positional bias interact to amplify this effect, with implications for system design. Third, we provide a framework for indirect LLM deployment that balances predictive power with data privacy. For practitioners, this research emphasize the need to audit AI weighting mechanisms, calibrate trust dynamics, and strategically design decision sequences in human-AI systems.


LangCoop: Collaborative Driving with Language

arXiv.org Artificial Intelligence

Multi-agent collaboration holds great promise for enhancing the safety, reliability, and mobility of autonomous driving systems by enabling information sharing among multiple connected agents. However, existing multi-agent communication approaches are hindered by limitations of existing communication media, including high bandwidth demands, agent heterogeneity, and information loss. To address these challenges, we introduce LangCoop, a new paradigm for collaborative autonomous driving that leverages natural language as a compact yet expressive medium for inter-agent communication. LangCoop features two key innovations: Mixture Model Modular Chain-of-thought (M$^3$CoT) for structured zero-shot vision-language reasoning and Natural Language Information Packaging (LangPack) for efficiently packaging information into concise, language-based messages. Through extensive experiments conducted in the CARLA simulations, we demonstrate that LangCoop achieves a remarkable 96\% reduction in communication bandwidth (< 2KB per message) compared to image-based communication, while maintaining competitive driving performance in the closed-loop evaluation. Our project page and code are at https://xiangbogaobarry.github.io/LangCoop/.