Agents
Imitation Learning of Correlated Policies in Stackelberg Games
Wang, Kuang-Da, Hsieh, Ping-Chun, Peng, Wen-Chih
Stackelberg games, widely applied in domains like economics and security, involve asymmetric interactions where a leader's strategy drives follower responses. Accurately modeling these dynamics allows domain experts to optimize strategies in interactive scenarios, such as turn-based sports like badminton. In multi-agent systems, agent behaviors are interdependent, and traditional Multi-Agent Imitation Learning (MAIL) methods often fail to capture these complex interactions. Correlated policies, which account for opponents' strategies, are essential for accurately modeling such dynamics. However, even methods designed for learning correlated policies, like CoDAIL, struggle in Stackelberg games due to their asymmetric decision-making, where leaders and followers cannot simultaneously account for each other's actions, often leading to non-correlated policies. Furthermore, existing MAIL methods that match occupancy measures or use adversarial techniques like GAIL or Inverse RL face scalability challenges, particularly in high-dimensional environments, and suffer from unstable training. To address these challenges, we propose a correlated policy occupancy measure specifically designed for Stackelberg games and introduce the Latent Stackelberg Differential Network (LSDN) to match it. LSDN models two-agent interactions as shared latent state trajectories and uses multi-output Geometric Brownian Motion (MO-GBM) to effectively capture joint policies. By leveraging MO-GBM, LSDN disentangles environmental influences from agent-driven transitions in latent space, enabling the simultaneous learning of interdependent policies. This design eliminates the need for adversarial training and simplifies the learning process. Extensive experiments on Iterative Matrix Games and multi-agent particle environments demonstrate that LSDN can better reproduce complex interaction dynamics than existing MAIL methods.
Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy
Hou, Abe Bohan, Du, Hongru, Wang, Yichen, Zhang, Jingyu, Wang, Zixiao, Liang, Paul Pu, Khashabi, Daniel, Gardner, Lauren, He, Tianxing
Can we simulate a sandbox society with generative agents to model human behavior, thereby reducing the over-reliance on real human trials for assessing public policies? In this work, we investigate the feasibility of simulating health-related decision-making, using vaccine hesitancy, defined as the delay in acceptance or refusal of vaccines despite the availability of vaccination services (MacDonald, 2015), as a case study. To this end, we introduce the VacSim framework with 100 generative agents powered by Large Language Models (LLMs). VacSim simulates vaccine policy outcomes with the following steps: 1) instantiate a population of agents with demographics based on census data; 2) connect the agents via a social network and model vaccine attitudes as a function of social dynamics and disease-related information; 3) design and evaluate various public health interventions aimed at mitigating vaccine hesitancy. To align with real-world results, we also introduce simulation warmup and attitude modulation to adjust agents' attitudes. We propose a series of evaluations to assess the reliability of various LLM simulations. Experiments indicate that models like Llama and Qwen can simulate aspects of human behavior but also highlight real-world alignment challenges, such as inconsistent responses with demographic profiles. This early exploration of LLM-driven simulations is not meant to serve as definitive policy guidance; instead, it serves as a call for action to examine social simulation for policy development.
Adaptive AUV Hunting Policy with Covert Communication via Diffusion Model
Guo, Xu, Hou, Xiangwang, Xu, Minrui, Chen, Jianrui, Wang, Jingjing, Du, Jun, Ren, Yong
Collaborative underwater target hunting, facilitated by multiple autonomous underwater vehicles (AUVs), plays a significant role in various domains, especially military missions. Existing research predominantly focuses on designing efficient and high-success-rate hunting policy, particularly addressing the target's evasion capabilities. However, in real-world scenarios, the target can not only adjust its evasion policy based on its observations and predictions but also possess eavesdropping capabilities. If communication among hunter AUVs, such as hunting policy exchanges, is intercepted by the target, it can adapt its escape policy accordingly, significantly reducing the success rate of the hunting mission. To address this challenge, we propose a covert communication-guaranteed collaborative target hunting framework, which ensures efficient hunting in complex underwater environments while defending against the target's eavesdropping. To the best of our knowledge, this is the first study to incorporate the confidentiality of inter-agent communication into the design of target hunting policy. Furthermore, given the complexity of coordinating multiple AUVs in dynamic and unpredictable environments, we propose an adaptive multi-agent diffusion policy (AMADP), which incorporates the strong generative ability of diffusion models into the multi-agent reinforcement learning (MARL) algorithm. Experimental results demonstrate that AMADP achieves faster convergence and higher hunting success rates while maintaining covertness constraints.
TERL: Large-Scale Multi-Target Encirclement Using Transformer-Enhanced Reinforcement Learning
Zhang, Heng, Zhao, Guoxiang, Ren, Xiaoqiang
Pursuit-evasion (PE) problem is a critical challenge in multi-robot systems (MRS). While reinforcement learning (RL) has shown its promise in addressing PE tasks, research has primarily focused on single-target pursuit, with limited exploration of multi-target encirclement, particularly in large-scale settings. This paper proposes a Transformer-Enhanced Reinforcement Learning (TERL) framework for large-scale multi-target encirclement. By integrating a transformer-based policy network with target selection, TERL enables robots to adaptively prioritize targets and safely coordinate robots. Results show that TERL outperforms existing RL-based methods in terms of encirclement success rate and task completion time, while maintaining good performance in large-scale scenarios. Notably, TERL, trained on small-scale scenarios (15 pursuers, 4 targets), generalizes effectively to large-scale settings (80 pursuers, 20 targets) without retraining, achieving a 100% success rate.
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills
Yuan, Haoqi, Bai, Yu, Fu, Yuhui, Zhou, Bohan, Feng, Yicheng, Xu, Xinrun, Zhan, Yi, Karlsson, Bรถrje F., Lu, Zongqing
Building autonomous robotic agents capable of achieving human-level performance in real-world embodied tasks is an ultimate goal in humanoid robot research. Recent advances have made significant progress in high-level cognition with Foundation Models (FMs) and low-level skill development for humanoid robots. However, directly combining these components often results in poor robustness and efficiency due to compounding errors in long-horizon tasks and the varied latency of different modules. We introduce Being-0, a hierarchical agent framework that integrates an FM with a modular skill library. The FM handles high-level cognitive tasks such as instruction understanding, task planning, and reasoning, while the skill library provides stable locomotion and dexterous manipulation for low-level control. To bridge the gap between these levels, we propose a novel Connector module, powered by a lightweight vision-language model (VLM). The Connector enhances the FM's embodied capabilities by translating language-based plans into actionable skill commands and dynamically coordinating locomotion and manipulation to improve task success. With all components, except the FM, deployable on low-cost onboard computation devices, Being-0 achieves efficient, real-time performance on a full-sized humanoid robot equipped with dexterous hands and active vision. Extensive experiments in large indoor environments demonstrate Being-0's effectiveness in solving complex, long-horizon tasks that require challenging navigation and manipulation subtasks. For further details and videos, visit https://beingbeyond.github.io/being-0.
Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes -- Insights from Urban Studies
Mushkani, Rashid, Berard, Hugo, Koseki, Shin
Cities are not monolithic; they are arenas of negotiation among groups that hold varying needs, values, and experiences. Conventional methods of urban assessment -- from standardized surveys to AI-driven evaluations -- frequently rely on a single consensus metric (e.g., an average measure of inclusivity or safety). Although such aggregations simplify design decisions, they risk obscuring the distinct perspectives of marginalized populations. In this paper, we present findings from a community-centered study in Montreal involving 35 residents with diverse demographic and social identities, particularly wheelchair users, seniors, and LGBTQIA2+ individuals. Using rating and ranking tasks on 20 urban sites, we observe that disagreements are systematic rather than random, reflecting structural inequalities, differing cultural values, and personal experiences of safety and accessibility. Based on these empirical insights, we propose negotiative alignment, an AI framework that treats disagreement as an essential input to be preserved, analyzed, and addressed. Negotiative alignment builds on pluralistic models by dynamically updating stakeholder preferences through multi-agent negotiation mechanisms, ensuring no single perspective is marginalized. We outline how this framework can be integrated into urban analytics -- and other decision-making contexts -- to retain minority viewpoints, adapt to changing stakeholder concerns, and enhance fairness and accountability. The study demonstrates that preserving and engaging with disagreement, rather than striving for an artificial consensus, can produce more equitable and responsive AI-driven outcomes in urban design.
On Some Fundamental Problems for Multi-Agent Systems Over Multilayer Networks
Rosenkrantz, Daniel J., Marathe, Madhav V., Qiu, Zirou, Ravi, S. S., Stearns, Richard E.
Many researchers have considered multi-agent systems over single-layer networks as models for studying diffusion phenomena. Since real-world networks involve connections between agents with different semantics (e.g., family member, friend, colleague), the study of multi-agent systems over multilayer networks has assumed importance. Our focus is on one class of multi-agent system models over multilayer networks, namely multilayer synchronous dynamical systems (MSyDSs). We study several fundamental problems for this model. We establish properties of the phase spaces of MSyDSs and bring out interesting differences between single-layer and multilayer dynamical systems. We show that, in general, the problem of determining whether two given MSyDSs are inequivalent is NP-complete. This hardness result holds even when the only difference between the two systems is the local function at just one node in one layer. We also present efficient algorithms for the equivalence problem for restricted versions of MSyDSs (e.g., systems where each local function is a bounded-threshold function, systems where the number of layers is fixed and each local function is symmetric). In addition, we investigate the expressive power of MSyDSs based on the number of layers. In particular, we examine conditions under which a system with k >= 2 layers has an equivalent system with k-1 or fewer layers.
Agent-Based Simulation of UAV Battery Recharging for IoT Applications: Precision Agriculture, Disaster Recovery, and Dengue Vector Control
Grando, Leonardo, Jaramillo, Juan Fernando Galindo, Leite, Jose Roberto Emiliano, Ursini, Edson Luiz
The low battery autonomy of Unnamed Aerial Vehicles (UAVs or drones) can make smart farming (precision agriculture), disaster recovery, and the fighting against dengue vector applications difficult. This article considers two approaches, first enumerating the characteristics observed in these three IoT application types and then modeling an UAV's battery recharge coordination using the Agent-Based Simulation (ABS) approach. In this way, we propose that each drone inside the swarm does not communicate concerning this recharge coordination decision, reducing energy usage and permitting remote usage. A total of 6000 simulations were run to evaluate how two proposed policies, the BaseLine (BL) and ChargerThershold (CT) coordination recharging policy, behave in 30 situations regarding how each simulation sets conclude the simulation runs and how much time they work until recharging results. CT policy shows more reliable results in extreme system usage. This work conclusion presents the potential of these three IoT applications to achieve their perpetual service without communication between drones and ground stations. This work can be a baseline for future policies and simulation parameter enhancements.
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Yue, Zhengrong, Zhuang, Shaobin, Li, Kunchang, Ding, Yanbo, Wang, Yali
Despite the recent advancement in video stylization, most existing methods struggle to render any video with complex transitions, based on an open style description of user query. To fill this gap, we introduce a generic multi-agent system for video stylization, V-Stylist, by a novel collaboration and reflection paradigm of multi-modal large language models. Specifically, our V-Stylist is a systematical workflow with three key roles: (1) Video Parser decomposes the input video into a number of shots and generates their text prompts of key shot content. Via a concise video-to-shot prompting paradigm, it allows our V-Stylist to effectively handle videos with complex transitions. (2) Style Parser identifies the style in the user query and progressively search the matched style model from a style tree. Via a robust tree-of-thought searching paradigm, it allows our V-Stylist to precisely specify vague style preference in the open user query. (3) Style Artist leverages the matched model to render all the video shots into the required style. Via a novel multi-round self-reflection paradigm, it allows our V-Stylist to adaptively adjust detail control, according to the style requirement. With such a distinct design of mimicking human professionals, our V-Stylist achieves a major breakthrough over the primary challenges for effective and automatic video stylization. Moreover,we further construct a new benchmark Text-driven Video Stylization Benchmark (TVSBench), which fills the gap to assess stylization of complex videos on open user queries. Extensive experiments show that, V-Stylist achieves the state-of-the-art, e.g.,V-Stylist surpasses FRESCO and ControlVideo by 6.05% and 4.51% respectively in overall average metrics, marking a significant advance in video stylization.
GameChat: Multi-LLM Dialogue for Safe, Agile, and Socially Optimal Multi-Agent Navigation in Constrained Environments
Mahadevan, Vagul, Zhang, Shangtong, Chandra, Rohan
Safe, agile, and socially compliant multi-robot navigation in cluttered and constrained environments remains a critical challenge. This is especially difficult with self-interested agents in decentralized settings, where there is no central authority to resolve conflicts induced by spatial symmetry. We address this challenge by proposing a novel approach, GameChat, which facilitates safe, agile, and deadlock-free navigation for both cooperative and self-interested agents. Key to our approach is the use of natural language communication to resolve conflicts, enabling agents to prioritize more urgent tasks and break spatial symmetry in a socially optimal manner. Our algorithm ensures subgame perfect equilibrium, preventing agents from deviating from agreed-upon behaviors and supporting cooperation. Furthermore, we guarantee safety through control barrier functions and preserve agility by minimizing disruptions to agents' planned trajectories. We evaluate GameChat in simulated environments with doorways and intersections. The results show that even in the worst case, GameChat reduces the time for all agents to reach their goals by over 35% from a naive baseline and by over 20% from SMG-CBF in the intersection scenario, while doubling the rate of ensuring the agent with a higher priority task reaches the goal first, from 50% (equivalent to random chance) to a 100% perfect performance at maximizing social welfare.