Agents
Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration
Wei, Zheng, Li, Mingchen, Zhang, Zeqian, Yuan, Ruibin, Hui, Pan, Qu, Huamin, Evans, James, Agrawala, Maneesh, Rao, Anyi
Recent advancements in multi-agent systems have demonstrated significant potential for enhancing creative task performance, such as long video generation. This study introduces three innovations to improve multi-agent collaboration. First, we propose OmniAgent, a hierarchical, graph-based multi-agent framework for long video generation that leverages a film-production-inspired architecture to enable modular specialization and scalable inter-agent collaboration. Second, inspired by context engineering, we propose hypergraph nodes that enable temporary group discussions among agents lacking sufficient context, reducing individual memory requirements while ensuring adequate contextual information. Third, we transition from directed acyclic graphs (DAGs) to directed cyclic graphs with limited retries, allowing agents to reflect and refine outputs iteratively, thereby improving earlier stages through feedback from subsequent nodes. These contributions lay the groundwork for developing more robust multi-agent systems in creative tasks.
Group size effects and collective misalignment in LLM multi-agent systems
Flint, Ariel, Aiello, Luca Maria, Pastor-Satorras, Romualdo, Baronchelli, Andrea
Multi-agent systems of large language models (LLMs) are rapidly expanding across domains, introducing dynamics not captured by single-agent evaluations. Yet, existing work has mostly contrasted the behavior of a single agent with that of a collective of fixed size, leaving open a central question: how does group size shape dynamics? Here, we move beyond this dichotomy and systematically explore outcomes across the full range of group sizes. We focus on multi-agent misalignment, building on recent evidence that interacting LLMs playing a simple coordination game can generate collective biases absent in individual models. First, we show that collective bias is a deeper phenomenon than previously assessed: interaction can amplify individual biases, introduce new ones, or override model-level preferences. Second, we demonstrate that group size affects the dynamics in a non-linear way, revealing model-dependent dynamical regimes. Finally, we develop a mean-field analytical approach and show that, above a critical population size, simulations converge to deterministic predictions that expose the basins of attraction of competing equilibria. These findings establish group size as a key driver of multi-agent dynamics and highlight the need to consider population-level effects when deploying LLM-based systems at scale.
A Novel Multi-Timescale Stability-Preserving Hierarchical Reinforcement Learning Controller Framework for Adaptive Control in High-Dimensional Dynamical Systems
Khaniki, Mohammad Ali Labbaf, Taroodi, Fateme, Safizadeh, Benyamin
Controlling high-dimensional stochastic systems, critical in robotics, autonomous vehicles, and hyperchaotic systems, faces the curse of dimensionality, lacks temporal abstraction, and often fails to ensure stochastic stability. To overcome these limitations, this study introduces the Multi-Timescale Lyapunov-Constrained Hierarchical Reinforcement Learning (MTLHRL) framework. MTLHRL integrates a hierarchical policy within a semi-Markov Decision Process (SMDP), featuring a high-level policy for strategic planning and a low-level policy for reactive control, which effectively manages complex, multi-timescale decision-making and reduces dimensionality overhead. Stability is rigorously enforced using a neural Lyapunov function optimized via Lagrangian relaxation and multi-timescale actor-critic updates, ensuring mean-square boundedness or asymptotic stability in the face of stochastic dynamics. The framework promotes efficient and reliable learning through trust-region constraints and decoupled optimization. Extensive simulations on an 8D hyperchaotic system and a 5-DOF robotic manipulator demonstrate MTLHRL's empirical superiority. It significantly outperforms baseline methods in both stability and performance, recording the lowest error indices (e.g., Integral Absolute Error (IAE): 3.912 in hyperchaotic control and IAE: 1.623 in robotics), achieving faster convergence, and exhibiting superior disturbance rejection. MTLHRL offers a theoretically grounded and practically viable solution for robust control of complex stochastic systems.
CGoT: A Novel Inference Mechanism for Embodied Multi-Agent Systems Using Composable Graphs of Thoughts
Nie, Yixiao, Zhang, Yang, Jin, Yingjie, Wang, Zhepeng, Li, Xiu, Li, Xiang
The integration of self-driving cars and service robots is becoming increasingly prevalent across a wide array of fields, playing a crucial and expanding role in both industrial applications and everyday life. In parallel, the rapid advancements in Large Language Models (LLMs) have garnered substantial attention and interest within the research community. This paper introduces a novel vehicle-robot system that leverages the strengths of both autonomous vehicles and service robots. In our proposed system, two autonomous ego-vehicles transports service robots to locations within an office park, where they perform a series of tasks. The study explores the feasibility and potential benefits of incorporating LLMs into this system, with the aim of enhancing operational efficiency and maximizing the potential of the cooperative mechanisms between the vehicles and the robots. This paper proposes a novel inference mechanism which is called CGOT toward this type of system where an agent can carry another agent. Experimental results are presented to validate the performance of the proposed method.
Rational Adversaries and the Maintenance of Fragility: A Game-Theoretic Theory of Rational Stagnation
Cooperative systems often remain in persistently suboptimal yet stable states. This paper explains such "rational stagnation" as an equilibrium sustained by a rational adversary whose utility follows the principle of potential loss, $u_{D} = U_{ideal} - U_{actual}$. Starting from the Prisoner's Dilemma, we show that the transformation $u_{i}' = a\,u_{i} + b\,u_{j}$ and the ratio of mutual recognition $w = b/a$ generate a fragile cooperation band $[w_{\min},\,w_{\max}]$ where both (C,C) and (D,D) are equilibria. Extending to a dynamic model with stochastic cooperative payoffs $R_{t}$ and intervention costs $(C_{c},\,C_{m})$, a Bellman-style analysis yields three strategic regimes: immediate destruction, rational stagnation, and intervention abandonment. The appendix further generalizes the utility to a reference-dependent nonlinear form and proves its stability under reference shifts, ensuring robustness of the framework. Applications to social-media algorithms and political trust illustrate how adversarial rationality can deliberately preserve fragility.
CreditXAI: A Multi-Agent System for Explainable Corporate Credit Rating
Shi, Yumeng, Yang, Zhongliang, Wang, Yisi, Zhou, Linna
In the domain of corporate credit rating, traditional deep learning methods have improved predictive accuracy but still suffer from the inherent 'black-box' problem and limited interpretability. While incorporating non-financial information enriches the data and provides partial interpretability, the models still lack hierarchical reasoning mechanisms, limiting their comprehensive analytical capabilities. To address these challenges, we propose CreditXAI, a Multi-Agent System (MAS) framework that simulates the collaborative decision-making process of professional credit analysts. The framework focuses on business, financial, and governance risk dimensions to generate consistent and interpretable credit assessments. Experimental results demonstrate that multi-agent collaboration improves predictive accuracy by more than 7% over the best single-agent baseline, confirming its significant synergistic advantage in corporate credit risk evaluation. This study provides a new technical pathway to build intelligent and interpretable credit rating models.
Right Place, Right Time: Market Simulation-based RL for Execution Optimisation
Olby, Ollie, Bacalum, Andreea, Baggott, Rory, Stillman, Namid
Execution algorithms are vital to modern trading, they enable market participants to execute large orders while minimising market impact and transaction costs. As these algorithms grow more sophisticated, optimising them becomes increasingly challenging. In this work, we present a reinforcement learning (RL) framework for discovering optimal execution strategies, evaluated within a reactive agent-based market simulator. This simulator creates reactive order flow and allows us to decompose slippage into its constituent components: market impact and execution risk. We assess the RL agent's performance using the efficient frontier based on work by Almgren and Chriss, measuring its ability to balance risk and cost. Results show that the RL-derived strategies consistently outperform baselines and operate near the efficient frontier, demonstrating a strong ability to optimise for risk and impact. These findings highlight the potential of reinforcement learning as a powerful tool in the trader's toolkit.
Solving Continuous Mean Field Games: Deep Reinforcement Learning for Non-Stationary Dynamics
Magnino, Lorenzo, Shao, Kai, Wu, Zida, Shen, Jiacheng, Lauriรจre, Mathieu
Mean field games (MFGs) have emerged as a powerful framework for modeling interactions in large-scale multi-agent systems. Despite recent advancements in reinforcement learning (RL) for MFGs, existing methods are typically limited to finite spaces or stationary models, hindering their applicability to real-world problems. This paper introduces a novel deep reinforcement learning (DRL) algorithm specifically designed for non-stationary continuous MFGs. The proposed approach builds upon a Fictitious Play (FP) methodology, leveraging DRL for best-response computation and supervised learning for average policy representation. Furthermore, it learns a representation of the time-dependent population distribution using a Conditional Normalizing Flow. To validate the effectiveness of our method, we evaluate it on three different examples of increasing complexity. By addressing critical limitations in scalability and density approximation, this work represents a significant advancement in applying DRL techniques to complex MFG problems, bringing the field closer to real-world multi-agent systems.
When UAV Swarm Meets IRS: Collaborative Secure Communications in Low-altitude Wireless Networks
Li, Jiahui, Liang, Xinyue, Sun, Geng, Kang, Hui, Wang, Jiacheng, Niyato, Dusit, Mao, Shiwen, Jamalipour, Abbas
Abstract--Low-altitude wireless networks (LA WNs) represent a promising architecture that integrates unmanned aerial vehicles (UA Vs) as aerial nodes to provide enhanced coverage, reliability, and throughput for diverse applications. However, these networks face significant security vulnerabilities from both known and potential unknown eavesdroppers, which may threaten data confidentiality and system integrity. T o solve this critical issue, we propose a novel secure communication framework for LA WNs where the selected UA Vs within a swarm function as a virtual antenna array (V AA), complemented by intelligent reflecting surface (IRS) to create a robust defense against eavesdropping attacks. Specifically, we formulate a multi-objective optimization problem that simultaneously maximizes the secrecy rate while minimizing the maximum sidelobe level and total energy consumption, requiring joint optimization of UA V excitation current weights, flight trajectories, and IRS phase shifts. This problem presents significant difficulties due to the dynamic nature of the system and heterogeneous components. Thus, we first transform the problem into a heterogeneous Markov decision process (MDP). Then, we propose a heterogeneous multi-agent control approach (HMCA) that integrates a dedicated IRS control policy with a multi-agent soft actor-critic framework for UA V control, which enables coordinated operation across heterogeneous network elements. Simulation results show that the proposed HMCA achieves superior performance compared to baseline approaches in terms of secrecy rate improvement, sidelobe suppression, and energy efficiency. Furthermore, we find that the collaborative and passive beamforming synergy between V AA and IRS creates robust security guarantees when the number of UA Vs increases. Jiahui Li, Xinyue Liang, and Hui Kang are with the College of Computer Science and Technology, Jilin University, Changchun 130012, China (E-mails: lijiahui@jlu.edu.cn; Geng Sun is with the College of Computer Science and Technology, Jilin University, Changchun 130012, China, and also with the Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China. He is also with the College of Computing and Data Science, Nanyang Technological University, Singapore 639798 (E-mail: sungeng@jlu.edu.cn).
STAR-RIS-assisted Collaborative Beamforming for Low-altitude Wireless Networks
Liang, Xinyue, Kang, Hui, Che, Junwei, Li, Jiahui, Sun, Geng, Wu, Qingqing, Wang, Jiacheng, Niyato, Dusit
Abstract--While low-altitude wireless networks (LA WNs) based on uncrewed aerial vehicles (UA Vs) offer high mobility, flexibility, and coverage for urban communications, they face severe signal attenuation in dense environments due to obstructions. T o address this critical issue, we consider introducing collaborative beamforming (CB) of UA Vs and omnidirectional reconfigurable beamforming (ORB) of simultaneous transmitting and reflecting reconfigurable intelligent surfaces (ST AR-RIS) to enhance the signal quality and directionality. On this basis, we formulate a joint rate and energy optimization problem (JREOP) to maximize the transmission rate of the overall system, while minimizing the energy consumption of the UA V swarm. Due to the non-convex and NP-hard nature of JREOP, we propose a heterogeneous multi-agent collaborative dynamic (HMCD) optimization framework, which has two core components. The first component is a simulated annealing (SA)-based ST AR-RIS control method, which dynamically optimizes reflection and transmission coefficients to enhance signal propagation. The second component is an improved multi-agent deep reinforcement learning (MADRL) control method, which incorporates a self-attention evaluation mechanism to capture interactions between UA Vs and an adaptive velocity transition mechanism to enhance training stability. Simulation results demonstrate that HMCD outperforms various baselines in terms of convergence speed, average transmission rate, and energy consumption. Further analysis reveals that the average transmission rate of the overall system scales positively with both UA V count and ST AR-RIS element numbers. Index T erms--UA V, ST AR-RIS, secure communications, collaborative beamforming, multi-agent deep reinforcement learning. Xinyue Liang, Hui Kang, Junwei Che, and Jiahui Li are with the College of Computer Science and Technology, Jilin University, Changchun 130012, China (e-mails: xyliang25@mails.jlu.edu.cn; Geng Sun is with the College of Computer Science and Technology, Jilin University, Changchun 130012, China, and with Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China; he is also affiliated with the College of Computing and Data Science, Nanyang Technological University, Singapore 639798 (e-mail: sungeng@jlu.edu.cn).