AITopics

Recent advances in large language models (LLMs) have enabled social simulation through multi-agent systems. Prior efforts focus on agent societies created from scratch, assigning agents with newly defined personas. However, simulating established fictional worlds and characters remain largely underexplored, despite its significant practical value. In this paper, we introduce BookWorld, a comprehensive system for constructing and simulating book-based multi-agent societies. BookWorld's design covers comprehensive real-world intricacies, including diverse and dynamic characters, fictional worldviews, geographical constraints and changes, e.t.c. BookWorld enables diverse applications including story generation, interactive games and social simulation, offering novel ways to extend and explore beloved fictional works. Through extensive experiments, we demonstrate that BookWorld generates creative, high-quality stories while maintaining fidelity to the source books, surpassing previous methods with a win rate of 75.36%. The code of this paper can be found at the project page: https://bookworld2025.github.io/.

agent, artificial intelligence, bookworld, (14 more...)

2504.14538

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

Bilal, Ahsan, Mohsin, Muhammad Ahmed, Umer, Muhammad, Bangash, Muhammad Awais Khan, Jamshed, Muhammad Ali

Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey

--This survey explores the development of meta-thinking capabilities in Large Language Models (LLMs) from a Multi-Agent Reinforcement Learning (MARL) perspective. The survey begins by analyzing current LLM limitations, such as hallucinations and the lack of internal self-assessment mechanisms. It then talks about newer methods, including RL from human feedback (RLHF), self-distillation, and chain-of-thought prompting, and each of their limitations. The crux of the survey is to talk about how multi-agent architectures, namely supervisor-agent hierarchies, agent debates, and theory of mind frameworks, can emulate human-like introspective behavior and enhance LLM robustness. By exploring reward mechanisms, self-play, and continuous learning methods in MARL, this survey gives a comprehensive roadmap to building introspective, adaptive, and trustworthy LLMs. Evaluation metrics, datasets, and future research avenues, including neuroscience-inspired architectures and hybrid symbolic reasoning, are also discussed. THE cognitive abilities, such as intelligence and creativity, have played a fundamental role in human discoveries and inventions. Understanding the relationship between these two cognitive abilities is important not only for the advancement of psychological theories but also for the improvement of educational practices [1]. However, researchers still hold different views on how intelligence and creativity interact, often leading to conflicting findings. A key question in this discourse is how intelligence enables structured problem-solving, while creativity fosters novel solutions that are essential for human cognition and artificial intelligence systems. Ahsan Bilal is with University of Oklahoma, Norman, OK, 73072, USA (e-mail: ahsan.bilal-1@ou.edu). Muhammad Ahmed Mohsin, Muhammad Umer are with Stanford University, Stanford, CA, 94305, USA (e-mail: muahmed, mumer@stanford.edu). Muhammad A wais Khan Bangash is with the School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK, 74075 USA (e-mail: awais.bangash@okstate.edu). Muhammad Ali Jamshed is with University of Glasgow, G12 8QQ, Glasgow, UK (e-mail: muhammadali.jamshed@glasgow.ac.uk). Similarly, in problem-solving tasks, intelligence aids in analyzing constraints, while creativity allows for flexible and unconventional approaches. Moreover, the role of internal thought processes varies with task complexity. Simpler tasks require minimal reasoning, whereas more complex tasks demand deeper cognitive engagement. This principle extends to artificial intelligence, where more sophisticated models exhibit enhanced performance in tasks requiring higher-order thinking.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

2504.1452

Country:

North America > United States > Oklahoma > Payne County > Stillwater (0.54)
North America > United States > Oklahoma > Cleveland County > Norman (0.54)
North America > United States > California > Santa Clara County (0.54)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue

Li, Xiang, Pan, Duyi, Xiao, Hongru, Han, Jiale, Tang, Jing, Ma, Jiabao, Wang, Wei, Cheng, Bo

Speech synthesis is crucial for human-computer interaction, enabling natural and intuitive communication. However, existing datasets involve high construction costs due to manual annotation and suffer from limited character diversity, contextual scenarios, and emotional expressiveness. To address these issues, we propose DialogueAgents, a novel hybrid agent-based speech synthesis framework, which integrates three specialized agents -- a script writer, a speech synthesizer, and a dialogue critic -- to collaboratively generate dialogues. Grounded in a diverse character pool, the framework iteratively refines dialogue scripts and synthesizes speech based on speech review, boosting emotional expressiveness and paralinguistic features of the synthesized dialogues. Using DialogueAgent, we contribute MultiTalk, a bilingual, multi-party, multi-turn speech dialogue dataset covering diverse topics. Extensive experiments demonstrate the effectiveness of our framework and the high quality of the MultiTalk dataset. We release the dataset and code https://github.com/uirlx/DialogueAgents to facilitate future research on advanced speech synthesis models and customized data generation.

artificial intelligence, dialogue, speech synthesis, (13 more...)

2504.14482

Country: Asia > China (0.47)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Optimizing SIA Development: A Case Study in User-Centered Design for Estuary, a Multimodal Socially Interactive Agent Framework

Lin, Spencer, Jun, Miru, Rizk, Basem, Shieh, Karen, Fisher, Scott, Mozgai, Sharon

This case study presents our user-centered design model for Socially Intelligent Agent (SIA) development frameworks through our experience developing Estuary, an open source multimodal framework for building low-latency real-time socially interactive agents. We leverage the Rapid Assessment Process (RAP) to collect the thoughts of leading researchers in the field of SIAs regarding the current state of the art for SIA development as well as their evaluation of how well Estuary may potentially address current research gaps. We achieve this through a series of end-user interviews conducted by a fellow researcher in the community. We hope that the findings of our work will not only assist the continued development of Estuary but also guide the development of other future frameworks and technologies for SIAs.

artificial intelligence, estuary, human computer interaction, (15 more...)

doi: 10.1145/3706599.3707399

2504.14427

Country: North America > United States > California > Los Angeles County > Los Angeles (0.29)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Cohen, Myke C., Grimm, David A., Mirsky, Reuth, Yin, Xiaoyun

Birds of a Different Feather Flock Together: Exploring Opportunities and Challenges in Animal-Human-Machine Teaming

Birds of a Different Feather Flock Together: Exploring Opportunities and Challenges in Animal-Human-Machine Teaming Myke C. Cohen 1,2, David A. Grimm 3, Reuth Mirsky 4, and Xiaoyun Yin 1 1 Arizona State University, Mesa, AZ 2 Aptima, Inc., Woburn, MA 3 Georgia Institute of Technology, Atlanta, GA 4 Tufts University, Medford, MA Abstract Animal-Human-Machine (AHM) teams are a type of hybrid intelligence system wherein interactions between a human, AI-enabled machine, and animal members can result in unique capabilities greater than the sum of their parts. This paper calls for a systematic approach to studying the design of AHM team structures to optimize performance and overcome limitations in various applied settings. We consider the challenges and opportunities in investigating the synergistic potential of AHM team members by introducing a set of dimensions of AHM team functioning to effectively utilize each member's strengths while compensating for individual weaknesses. Using three representative examples of such teams--security screening, search-and-rescue, and guide dogs--the paper illustrates how AHM teams can tackle complex tasks. We conclude with open research directions that this multidimensional approach presents for studying hybrid human-AI systems beyond AHM teams. Keywords: multi-agent systems, animal-human-machine teaming, functional allocation 1 Introduction Consider a Blind or Visually Impaired (BVI) person training to be assisted by a guide dog. When the pair reaches an obstacle along their path, they 1 arXiv:2504.13973v1

ahm team, artificial intelligence, teammate, (14 more...)

2504.13973

Country:

North America > United States > Massachusetts > Middlesex County > Woburn (0.24)
North America > United States > Massachusetts > Middlesex County > Medford (0.24)
North America > United States > Georgia > Fulton County > Atlanta (0.24)
North America > United States > Arizona > Maricopa County > Mesa (0.24)

Genre: Research Report (0.50)

Industry: Health & Medicine > Consumer Health (0.90)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Nardelli, Alice, Sgorbissa, Antonio, Recchiuto, Carmine Tommaso

Designing Empathetic Companions: Exploring Personality, Emotion, and Trust in Social Robots

Designing Empathetic Companions: Exploring Personality, Emotion, and Trust in Social Robots Alice Nardelli* 1, Antonio Sgorbissa 1, Carmine Tommaso Recchiuto 1 Abstract -- How should a companion robot behave? In this research, we present a cognitive architecture based on a tailored personality model to investigate the impact of robotic personalities on the perception of companion robots. Drawing from existing literature, we identified empathy, trust, and enjoyability as key factors in building companionship with social robots. Based on these insights, we implemented a personality-dependent, emotion-aware generator, recognizing the crucial role of robot emotions in shaping these elements. We then conducted a user study involving 84 dyadic conversation sessions with the emotional robot Navel, which exhibited different personalities. Results were derived from a multimodal analysis, including questionnaires, open-ended responses, and behavioral observations. This approach allowed us to validate the developed emotion generator and explore the relationship between the personality traits of Agreeableness, Extraversion, Conscientiousness, and Empathy. Furthermore, we drew robust conclusions on how these traits influence relational trust, capability trust, enjoyability, and sociability.

artificial intelligence, personality, robot, (17 more...)

2504.13964

Country: Europe (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.81)

Evaluation and Incident Prevention in an Enterprise AI Assistant

Maharaj, Akash V., Arbour, David, Lee, Daniel, Bhattacharya, Uttaran, Rao, Anup, Zane, Austin, Feller, Avi, Qian, Kun, Li, Yunyao

Enterprise AI Assistants are increasingly deployed in domains where accuracy is paramount, making each erroneous output a potentially significant incident. This paper presents a comprehensive framework for monitoring, benchmarking, and continuously improving such complex, multi-component systems under active development by multiple teams. Our approach encompasses three key elements: (1) a hierarchical ``severity'' framework for incident detection that identifies and categorizes errors while attributing component-specific error rates, facilitating targeted improvements; (2) a scalable and principled methodology for benchmark construction, evaluation, and deployment, designed to accommodate multiple development teams, mitigate overfitting risks, and assess the downstream impact of system modifications; and (3) a continual improvement strategy leveraging multidimensional evaluation, enabling the identification and implementation of diverse enhancement opportunities. By adopting this holistic framework, organizations can systematically enhance the reliability and performance of their AI Assistants, ensuring their efficacy in critical enterprise environments. We conclude by discussing how this multifaceted evaluation approach opens avenues for various classes of enhancements, paving the way for more robust and trustworthy AI systems.

ai assistant, artificial intelligence, machine learning, (15 more...)

doi: 10.1609/aaai.v39i28.35161

2504.13924

Country: North America > Mexico (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.67)

The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning

Lee, Dong Won, Kim, Yubin, Guvenoz, Denison, Jeong, Sooyeon, Malachowsky, Parker, Morency, Louis-Philippe, Breazeal, Cynthia, Park, Hae Won

Our work aims to advance the social reasoning of embodied artificial intelligence (AI) agents in real-world social interactions. Recently, language models (LMs) and foundational models (FMs) are being utilized as automatic evaluators of human-AI interactions with the goal of eventually being used to improve the policy of the AI agent. To enable further research in this direction, we introduce a large-scale real-world Human Robot Social Interaction (HSRI) Dataset to benchmark the capabilities of LMs and FMs to identify and reason about social interactions, specifically with regard to robot social errors and competencies . Our dataset consists of 400 real-world human social robot interaction videos and over 10K annotations, detailing the robot's social errors, competencies, rationale, and corrective actions, capturing unique aspects of human-AI interaction only present in real-world interactions. To further assess AI models' ability to reason about social interactions, we propose eight new benchmark tasks for evaluating centered around whether AI models can (1) evaluate social interactions via detecting social errors and competencies, (2) identify the explanatory factors associated to errors and competencies, (3) understand the flow of real-world social interactions, and (4) provide reasons and corrective actions for social errors. Human studies and experiments with modern LMs and FMs reveal that current models struggle with these tasks, demonstrating that our dataset and benchmark provides a step forward towards socially intelligent AI.

large language model, machine learning, natural language, (19 more...)

2504.13898

Country: Europe (0.67)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(4 more...)

Feng, Yuanjun, Chodhary, Vivek, Shrestha, Yash Raj

Human aversion? Do AI Agents Judge Identity More Harshly Than Performance

This study examines the understudied role of algorithmic evaluation of human judgment in hybrid decision-making systems, a critical gap in management research. While extant literature focuses on human reluctance to follow algorithmic advice, we reverse the perspective by investigating how AI agents based on large language models (LLMs) assess and integrate human input. Our work addresses a pressing managerial constraint: firms barred from deploying LLMs directly due to privacy concerns can still leverage them as mediating tools (for instance, anonymized outputs or decision pipelines) to guide high-stakes choices like pricing or discounts without exposing proprietary data. Through a controlled prediction task, we analyze how an LLM-based AI agent weights human versus algorithmic predictions. We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors--a bias exacerbated when the agent's identity (human vs AI) is disclosed and the human is positioned second. These results reveal a disconnect between AI-generated trust metrics and the actual influence of human judgment, challenging assumptions about equitable human-AI collaboration. Our findings offer three key contributions. First, we identify a reverse algorithm aversion phenomenon, where AI agents undervalue human input despite comparable error rates. Second, we demonstrate how disclosure and positional bias interact to amplify this effect, with implications for system design. Third, we provide a framework for indirect LLM deployment that balances predictive power with data privacy. For practitioners, this research emphasize the need to audit AI weighting mechanisms, calibrate trust dynamics, and strategically design decision sequences in human-AI systems.

large language model, machine learning, natural language, (19 more...)

2504.13871

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

LangCoop: Collaborative Driving with Language

Gao, Xiangbo, Wu, Yuheng, Wang, Rujia, Liu, Chenxi, Zhou, Yang, Tu, Zhengzhong

Multi-agent collaboration holds great promise for enhancing the safety, reliability, and mobility of autonomous driving systems by enabling information sharing among multiple connected agents. However, existing multi-agent communication approaches are hindered by limitations of existing communication media, including high bandwidth demands, agent heterogeneity, and information loss. To address these challenges, we introduce LangCoop, a new paradigm for collaborative autonomous driving that leverages natural language as a compact yet expressive medium for inter-agent communication. LangCoop features two key innovations: Mixture Model Modular Chain-of-thought (M$^3$CoT) for structured zero-shot vision-language reasoning and Natural Language Information Packaging (LangPack) for efficiently packaging information into concise, language-based messages. Through extensive experiments conducted in the CARLA simulations, we demonstrate that LangCoop achieves a remarkable 96\% reduction in communication bandwidth (< 2KB per message) compared to image-based communication, while maintaining competitive driving performance in the closed-loop evaluation. Our project page and code are at https://xiangbogaobarry.github.io/LangCoop/.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

2504.13406

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology (0.90)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)