Goto

Collaborating Authors

 Agents


Validating Generative Agent-Based Models of Social Norm Enforcement: From Replication to Novel Predictions

arXiv.org Artificial Intelligence

As large language models (LLMs) advance, there is growing interest in using them to simulate human social behavior through generative agent-based modeling (GABM). However, validating these models remains a key challenge. We present a systematic two-stage validation approach using social dilemma paradigms from psychological literature, first identifying the cognitive components necessary for LLM agents to reproduce known human behaviors in mixed-motive settings from two landmark papers, then using the validated architecture to simulate novel conditions. Our model comparison of different cognitive architectures shows that both persona-based individual differences and theory of mind capabilities are essential for replicating third-party punishment (TPP) as a costly signal of trustworthiness. For the second study on public goods games, this architecture is able to replicate an increase in cooperation from the spread of reputational information through gossip. However, an additional strategic component is necessary to replicate the additional boost in cooperation rates in the condition that allows both ostracism and gossip. We then test novel predictions for each paper with our validated generative agents. We find that TPP rates significantly drop in settings where punishment is anonymous, yet a substantial amount of TPP persists, suggesting that both reputational and intrinsic moral motivations play a role in this behavior. For the second paper, we introduce a novel intervention and see that open discussion periods before rounds of the public goods game further increase contributions, allowing groups to develop social norms for cooperation. This work provides a framework for validating generative agent models while demonstrating their potential to generate novel and testable insights into human social behavior.


Reasoning Language Models for Root Cause Analysis in 5G Wireless Networks

arXiv.org Artificial Intelligence

Root Cause Analysis (RCA) in mobile networks remains a challenging task due to the need for interpretability, domain expertise, and causal reasoning. In this work, we propose a lightweight framework that leverages Large Language Models (LLMs) for RCA. To do so, we introduce TeleLogs, a curated dataset of annotated troubleshooting problems designed to benchmark RCA capabilities. Our evaluation reveals that existing open-source reasoning LLMs struggle with these problems, underscoring the need for domain-specific adaptation. To address this issue, we propose a two-stage training methodology that combines supervised fine-tuning with reinforcement learning to improve the accuracy and reasoning quality of LLMs. The proposed approach fine-tunes a series of RCA models to integrate domain knowledge and generate structured, multi-step diagnostic explanations, improving both interpretability and effectiveness. Extensive experiments across multiple LLM sizes show significant performance gains over state-of-the-art reasoning and non-reasoning models, including strong generalization to randomized test variants. These results demonstrate the promise of domain-adapted, reasoning-enhanced LLMs for practical and explainable RCA in network operation and management.


A Multi-Agent Generative AI Framework for IC Module-Level Verification Automation

arXiv.org Artificial Intelligence

As large language models demonstrate enormous potential in the field of Electronic Design Automation (EDA), generative AI-assisted chip design is attracting widespread attention from academia and industry. Although these technologies have made preliminary progress in tasks such as code generation, their application in chip verification -- a critical bottleneck in the chip development cycle -- remains at an exploratory stage. This paper proposes an innovative Multi-Agent Verification Framework (MAVF) aimed at addressing the limitations of current single-LLM approaches in complex verification tasks. Our framework builds an automated transformation system from design specifications to testbench through the collaborative work of multiple specialized agents, including specification parsing, verification strategy generation, and code implementation. Through verification experiments on multiple chip modules of varying complexity, results show that MAVF significantly outperforms traditional manual methods and single-dialogue generative AI approaches in verification document parsing and generation, as well as automated testbench generation. This research opens new directions for exploring generative AI applications in verification automation, potentially providing effective approaches to solving the most challenging bottleneck issues in chip design.


Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics

arXiv.org Artificial Intelligence

The development of reinforcement learning (RL) algorithms has been largely driven by ambitious challenge tasks and benchmarks. Games have dominated RL benchmarks because they present relevant challenges, are inexpensive to run and easy to understand. While games such as Go and Atari have led to many breakthroughs, they often do not directly translate to real-world embodied applications. In recognising the need to diversify RL benchmarks and addressing complexities that arise in embodied interaction scenarios, we introduce Assistax: an open-source benchmark designed to address challenges arising in assistive robotics tasks. Assistax uses JAX's hardware acceleration for significant speed-ups for learning in physics-based simulations. In terms of open-loop wall-clock time, Assistax runs up to $370\times$ faster when vectorising training runs compared to CPU-based alternatives. Assistax conceptualises the interaction between an assistive robot and an active human patient using multi-agent RL to train a population of diverse partner agents against which an embodied robotic agent's zero-shot coordination capabilities can be tested. Extensive evaluation and hyperparameter tuning for popular continuous control RL and MARL algorithms provide reliable baselines and establish Assistax as a practical benchmark for advancing RL research for assistive robotics. The code is available at: https://github.com/assistive-autonomy/assistax.


Decentralized Modeling of Vehicular Maneuvers and Interactions at Urban Junctions

arXiv.org Artificial Intelligence

Modeling and evaluation of automated vehicles (AVs) in mixed-autonomy traffic is essential prior to their safe and efficient deployment. This is especially important at urban junctions where complex multi-agent interactions occur. Current approaches for modeling vehicular maneuvers and interactions at urban junctions have limitations in formulating non-cooperative interactions and vehicle dynamics within a unified mathematical framework. Previous studies either assume predefined paths or rely on cooperation and central controllability, limiting their realism and applicability in mixed-autonomy traffic. This paper addresses these limitations by proposing a modeling framework for trajectory planning and decentralized vehicular control at urban junctions. The framework employs a bi-level structure where the upper level generates kinematically feasible reference trajectories using an efficient graph search algorithm with a custom heuristic function, while the lower level employs a predictive controller for trajectory tracking and optimization. Unlike existing approaches, our framework does not require central controllability or knowledge sharing among vehicles. The vehicle kinematics are explicitly incorporated at both levels, and acceleration and steering angle are used as control variables. This intuitive formulation facilitates analysis of traffic efficiency, environmental impacts, and motion comfort. The framework's decentralized structure accommodates operational and stochastic elements, such as vehicles' detection range, perception uncertainties, and reaction delay, making the model suitable for safety analysis. Numerical and simulation experiments across diverse scenarios demonstrate the framework's capability in modeling accurate and realistic vehicular maneuvers and interactions at various urban junctions, including unsignalized intersections and roundabouts.


Games Agents Play: Towards Transactional Analysis in LLM-based Multi-Agent Systems

arXiv.org Artificial Intelligence

Multi-Agent Systems (MAS) are increasingly used to simulate social interactions, but most of the frameworks miss the underlying cognitive complexity of human behavior. In this paper, we introduce Trans-ACT (Transactional Analysis Cognitive Toolkit), an approach embedding Transactional Analysis (TA) principles into MAS to generate agents with realistic psychological dynamics. Trans-ACT integrates the Parent, Adult, and Child ego states into an agent's cognitive architecture. Each ego state retrieves context-specific memories and uses them to shape response to new situations. The final answer is chosen according to the underlying life script of the agent. Our experimental simulation, which reproduces the Stupid game scenario, demonstrates that agents grounded in cognitive and TA principles produce deeper and context-aware interactions. Looking ahead, our research opens a new way for a variety of applications, including conflict resolution, educational support, and advanced social psychology studies.


Adaptive Cluster Collaborativeness Boosts LLMs Medical Decision Support Capacity

arXiv.org Artificial Intelligence

Abstract--Large language models (LLMs) have proven effective in artificial intelligence, where the multi-agent system (MAS) holds considerable promise for healthcare development by achieving the collaboration of LLMs. However, the absence of a systematic pipeline for agent construction and the rigidity of static collaboration patterns render current MAS-based models vulnerable to collaboration failures, resulting in substantial performance degradation in medical decision-making scenarios. T o this end, we propose a novel Masked Agent Collaboration (MAC) framework that harnesses Pareto-optimal agent construction and cross-consistency maximization mechanisms to achieve adaptive progressive propagation of collaborative information, boosting the medical decision-making capacity. Specifically, we first conduct a Pareto-frontier factors analysis towards the LLMs pool to consider their key factors, including the model size, inference time, diversity score, and throughput ratio, where we calculate the similarity between pairwise outputs within an LLM to derive its diversity score. Beyond this analysis, we enable the identification of Pareto-optimal models that balance efficiency and capability, which are subsequently selected as collaborative agents to consider the fundamental trade-offs inherent in practical LLM deployment. Afterward, we measure the pairwise similarity between the outputs from collaborative agents to determine their cross-consistency values, subsequently masking out the agent with the lowest cross-consistency value to eliminate the output that is likely semantically inconsistent. Finally, we conduct collaboration of agents by achieving adaptive progressive propagation, where each agent aggregates the outputs of unmasked agents from the previous layer as its input to generate the corresponding output via prompt engineering. Evaluations across three datasets confirm the effectiveness of our MAC, notably outperforming the multi-agent collaboration model (composed of 70B-141B open-access LLMs) by 16.55% and GPT -4 by 9.35% in Obstetrics and Gynecology on NEJMQA.


PhysVarMix: Physics-Informed Variational Mixture Model for Multi-Modal Trajectory Prediction

arXiv.org Machine Learning

Accurate prediction of future agent trajectories is a critical challenge for ensuring safe and efficient autonomous navigation, particularly in complex urban environments characterized by multiple plausible future scenarios. In this paper, we present a novel hybrid approach that integrates learning-based with physics-based constraints to address the multi-modality inherent in trajectory prediction. Our method employs a variational Bayesian mixture model to effectively capture the diverse range of potential future behaviors, moving beyond traditional unimodal assumptions. Unlike prior approaches that predominantly treat trajectory prediction as a data-driven regression task, our framework incorporates physical realism through sector-specific boundary conditions and Model Predictive Control (MPC)-based smoothing. These constraints ensure that predicted trajectories are not only data-consistent but also physically plausible, adhering to kinematic and dynamic principles. Furthermore, our method produces interpretable and diverse trajectory predictions, enabling enhanced downstream decision-making and planning in autonomous driving systems. We evaluate our approach on two benchmark datasets, demonstrating superior performance compared to existing methods. Comprehensive ablation studies validate the contributions of each component and highlight their synergistic impact on prediction accuracy and reliability. By balancing data-driven insights with physics-informed constraints, our approach offers a robust and scalable solution for navigating the uncertainties of real-world urban environments.


Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation

arXiv.org Artificial Intelligence

Nearly all human work is collaborative; thus, the evaluation of real-world NLP applications often requires multiple dimensions that align with diverse human perspectives. As real human evaluator resources are often scarce and costly, the emerging "LLM-as-a-judge" paradigm sheds light on a promising approach to leverage LLM agents to believably simulate human evaluators. Yet, to date, existing LLM-as-a-judge approaches face two limitations: persona descriptions of agents are often arbitrarily designed, and the frameworks are not generalizable to other tasks. To address these challenges, we propose MAJ-EVAL, a Multi-Agent-as-Judge evaluation framework that can automatically construct multiple evaluator personas with distinct dimensions from relevant text documents (e.g., research papers), instantiate LLM agents with the personas, and engage in-group debates with multi-agents to Generate multi-dimensional feedback. Our evaluation experiments in both the educational and medical domains demonstrate that MAJ-EVAL can generate evaluation results that better align with human experts' ratings compared with conventional automated evaluation metrics and existing LLM-as-a-judge methods.


Free Energy-Inspired Cognitive Risk Integration for AV Navigation in Pedestrian-Rich Environments

arXiv.org Artificial Intelligence

--Recent advances in autonomous vehicle (A V) behavior planning have shown impressive social interaction capabilities when interacting with other road users. However, achieving human-like prediction and decision-making in interactions with vulnerable road users remains a key challenge in complex multi-agent interactive environments. Existing research focuses primarily on crowd navigation for small mobile robots, which cannot be directly applied to A Vs due to inherent differences in their decision-making strategies and dynamic boundaries. Moreover, pedestrians in these multi-agent simulations follow fixed behavior patterns that cannot dynamically respond to A V actions. T o overcome these limitations, this paper proposes a novel framework for modeling interactions between the A V and multiple pedestrians. In this framework, a cognitive process modeling approach inspired by the Free Energy Principle is integrated into both the A V and pedestrian models to simulate more realistic interaction dynamics. Specifically, the proposed pedestrian Cognitive-Risk Social Force Model adjusts goal-directed and repulsive forces using a fused measure of cognitive uncertainty and physical risk to produce human-like trajectories. Meanwhile, the A V leverages this fused risk to construct a dynamic, risk-aware adjacency matrix for a Graph Convolutional Network within a Soft Actor-Critic architecture, allowing it to make more reasonable and informed decisions. Simulation results indicate that our proposed framework effectively improves safety, efficiency, and smoothness of A V navigation compared to the state-of-the-art method. N recent years, rapid advancements in autonomous driving technology have enabled autonomous vehicles (A V) to expand beyond simple, structured highway environments and to be increasingly deployed in more complex urban environments [1]. They are expected to become a key component of future urban transportation systems [2]. Unlike structured roads with clear traffic rules and physical lane separations, shared spaces in cities such as squares, campuses, and residential areas usually lack right-of-way regulations and explicit physical boundaries [3] between vehicles and pedestrians. Y afei Wang is with the School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China (email: wyfjlu@sjtu.edu.cn) The high uncertainty and dynamic nature of human behavior [5], especially when multiple pedestrians are moving at the same time, significantly complicate the A V's decision-making process. To ensure both safety and efficiency, A Vs must make real-time decisions and continuously adapt their strategies in response to surrounding pedestrian behaviors.