Agent Societies
JaxLife: An Open-Ended Agentic Simulator
Lu, Chris, Beukman, Michael, Matthews, Michael, Foerster, Jakob
Human intelligence emerged through the process of natural selection and evolution on Earth. We investigate what it would take to re-create this process in silico. While past work has often focused on low-level processes (such as simulating physics or chemistry), we instead take a more targeted approach, aiming to evolve agents that can accumulate open-ended culture and technologies across generations. Towards this, we present JaxLife: an artificial life simulator in which embodied agents, parameterized by deep neural networks, must learn to survive in an expressive world containing programmable systems. First, we describe the environment and show that it can facilitate meaningful Turing-complete computation. We then analyze the evolved emergent agents' behavior, such as rudimentary communication protocols, agriculture, and tool use. Finally, we investigate how complexity scales with the amount of compute used. We believe JaxLife takes a step towards studying evolved behavior in more open-ended simulations. Our code is available at https://github.com/luchris429/JaxLife
BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems
Wang, Wei, Zhang, Dan, Feng, Tao, Wang, Boyan, Tang, Jie
Large Language Models (LLMs) are becoming increasingly powerful and capable of handling complex tasks, e.g., building single agents and multi-agent systems. Compared to single agents, multi-agent systems have higher requirements for the collaboration capabilities of language models. Many benchmarks are proposed to evaluate their collaborative abilities. However, these benchmarks lack fine-grained evaluations of LLM collaborative capabilities. Additionally, multi-agent collaborative and competitive scenarios are ignored in existing works. To address these two problems, we propose a benchmark, called BattleAgentBench, which defines seven sub-stages of three varying difficulty levels and conducts a fine-grained evaluation of language models in terms of single-agent scenario navigation capabilities, paired-agent task execution abilities, and multi-agent collaboration and competition capabilities. We conducted extensive evaluations on leading four closed-source and seven open-source models. Experimental results indicate that API-based models perform excellently on simple tasks but open-source small models struggle with simple tasks. Regarding difficult tasks that require collaborative and competitive abilities, although API-based models have demonstrated some collaborative capabilities, there is still enormous room for improvement.
Decentralized Unlabeled Multi-agent Pathfinding Via Target And Priority Swapping (With Supplementary)
Dergachev, Stepan, Yakovlev, Konstantin
In this paper we study a challenging variant of the multi-agent pathfinding problem (MAPF), when a set of agents must reach a set of goal locations, but it does not matter which agent reaches a specific goal - Anonymous MAPF (AMAPF). Current optimal and suboptimal AMAPF solvers rely on the existence of a centralized controller which is in charge of both target assignment and pathfinding. We extend the state of the art and present the first AMAPF solver capable of solving the problem at hand in a fully decentralized fashion, when each agent makes decisions individually and relies only on the local communication with the others. The core of our method is a priority and target swapping procedure tailored to produce consistent goal assignments (i.e. making sure that no two agents are heading towards the same goal). Coupled with an established rule-based path planning, we end up with a TP-SWAP, an efficient and flexible approach to solve decentralized AMAPF. On the theoretical side, we prove that TP-SWAP is complete (i.e. TP-SWAP guarantees that each target will be reached by some agent). Empirically, we evaluate TP-SWAP across a wide range of setups and compare it to both centralized and decentralized baselines. Indeed, TP-SWAP outperforms the fully-decentralized competitor and can even outperform the semi-decentralized one (i.e. the one relying on the initial consistent goal assignment) in terms of flowtime (a widespread cost objective in MAPF
Employing Artificial Intelligence to Steer Exascale Workflows with Colmena
Ward, Logan, Pauloski, J. Gregory, Hayot-Sasson, Valerie, Babuji, Yadu, Brace, Alexander, Chard, Ryan, Chard, Kyle, Thakur, Rajeev, Foster, Ian
Computational workflows are a common class of application on supercomputers, yet the loosely coupled and heterogeneous nature of workflows often fails to take full advantage of their capabilities. We created Colmena to leverage the massive parallelism of a supercomputer by using Artificial Intelligence (AI) to learn from and adapt a workflow as it executes. Colmena allows scientists to define how their application should respond to events (e.g., task completion) as a series of cooperative agents. In this paper, we describe the design of Colmena, the challenges we overcame while deploying applications on exascale systems, and the science workflows we have enhanced through interweaving AI. The scaling challenges we discuss include developing steering strategies that maximize node utilization, introducing data fabrics that reduce communication overhead of data-intensive tasks, and implementing workflow tasks that cache costly operations between invocations. These innovations coupled with a variety of application patterns accessible through our agent-based steering model have enabled science advances in chemistry, biophysics, and materials science using different types of AI. Our vision is that Colmena will spur creative solutions that harness AI across many domains of scientific computing.
On Centralized Critics in Multi-Agent Reinforcement Learning
Lyu, Xueguang, Baisero, Andrea, Xiao, Yuchen, Daley, Brett, Amato, Christopher
Centralized Training for Decentralized Execution, where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic where the centralized critic is allowed access global information of the entire system, including the true system state. Such centralized critics are possible given offline information and are not used for online execution. While these methods perform well in a number of domains and have become a de facto standard in MARL, using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature.
Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective
Liu, Qi, Gao, Jianqi, Zhu, Dongjie, Pang, Xizheng, Chen, Pengbin, Guo, Jingxiang, Li, Yanjie
Multi-agent target assignment and path planning (TAPF) are two key problems in intelligent warehouse. However, most literature only addresses one of these two problems separately. In this study, we propose a method to simultaneously solve target assignment and path planning from a perspective of cooperative multi-agent deep reinforcement learning (RL). To the best of our knowledge, this is the first work to model the TAPF problem for intelligent warehouse to cooperative multi-agent deep RL, and the first to simultaneously address TAPF based on multi-agent deep RL. Furthermore, previous literature rarely considers the physical dynamics of agents. In this study, the physical dynamics of the agents is considered. Experimental results show that our method performs well in various task settings, which means that the target assignment is solved reasonably well and the planned path is almost shortest. Moreover, our method is more time-efficient than baselines.
Recursive Distributed Collaborative Aided Inertial Navigation
In this dissertation, we investigate the issue of robust localization in swarms of heterogeneous mobile agents with multiple and time-varying sensing modalities. Our focus is the development of filter-based and decoupled estimators under the assumption that agents possess communication and processing capabilities. Based on the findings from Distributed Collaborative State Estimation and modular sensor fusion, we propose a novel Kalman filter decoupling paradigm, which is termed Isolated Kalman Filtering (IKF). This paradigm is formally discussed and the treatment of delayed measurement is studied. The impact of approximation made was investigated on different observation graphs and the filter credibility was evaluated on a linear system in a Monte Carlo simulation. Finally, we propose a multi-agent modular sensor fusion approach based on the IKF paradigm, in order to cooperatively estimate the global state of a multi-agent system in a distributed way and fuse information provided by different on-board sensors in a computationally efficient way. As a consequence, this approach can be performed distributed among agents, while (i) communication between agents is only required at the moment of inter-agent joint observations, (ii) one agent acts as interim master to process state corrections isolated, (iii) agents can be added and removed from the swarm, (iv) each agent's full state can vary during mission (each local sensor suite can be truly modular), and (v) delayed and multi-rate sensor updates are supported. Extensive evaluation on realistic simulated and real-world data sets show that the proposed Isolated Kalman Filtering (IKF) paradigm, is applicable for both, truly modular single agent estimation and distributed collaborative multi-agent estimation problems.
Understanding Epistemic Language with a Bayesian Theory of Mind
Ying, Lance, Zhi-Xuan, Tan, Wong, Lionel, Mansinghka, Vikash, Tenenbaum, Joshua B.
How do people understand and evaluate claims about others' beliefs, even though these beliefs cannot be directly observed? In this paper, we introduce a cognitive model of epistemic language interpretation, grounded in Bayesian inferences about other agents' goals, beliefs, and intentions: a language-augmented Bayesian theory-of-mind (LaBToM). By translating natural language into an epistemic ``language-of-thought'', then evaluating these translations against the inferences produced by inverting a probabilistic generative model of rational action and perception, LaBToM captures graded plausibility judgments about epistemic claims. We validate our model in an experiment where participants watch an agent navigate a maze to find keys hidden in boxes needed to reach their goal, then rate sentences about the agent's beliefs. In contrast with multimodal LLMs (GPT-4o, Gemini Pro) and ablated models, our model correlates highly with human judgments for a wide range of expressions, including modal language, uncertainty expressions, knowledge claims, likelihood comparisons, and attributions of false belief.
Empirical Equilibria in Agent-based Economic systems with Learning agents
Dwarakanath, Kshama, Vyetrenko, Svitlana, Balch, Tucker
We present an agent-based simulator for economic systems with heterogeneous households, firms, central bank, and government agents. These agents interact to define production, consumption, and monetary flow. Each agent type has distinct objectives, such as households seeking utility from consumption and the central bank targeting inflation and production. We define this multi-agent economic system using an OpenAI Gym-style environment, enabling agents to optimize their objectives through reinforcement learning. Standard multi-agent reinforcement learning (MARL) schemes, like independent learning, enable agents to learn concurrently but do not address whether the resulting strategies are at equilibrium. This study integrates the Policy Space Response Oracle (PSRO) algorithm, which has shown superior performance over independent MARL in games with homogeneous agents, with economic agent-based modeling. We use PSRO to develop agent policies approximating Nash equilibria of the empirical economic game, thereby linking to economic equilibria. Our results demonstrate that PSRO strategies achieve lower regret values than independent MARL strategies in our economic system with four agent types. This work aims to bridge artificial intelligence, economics, and empirical game theory towards future research.
MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems
Wang, Qian, Wang, Tianyu, Li, Qinbin, Liang, Jingsheng, He, Bingsheng
With the emergence of large language models (LLMs), LLM-powered multi-agent systems (LLM-MA systems) have been proposed to tackle real-world tasks. However, their agents mostly follow predefined Standard Operating Procedures (SOPs) that remain unchanged across the whole interaction, lacking autonomy and scalability. Additionally, current solutions often overlook the necessity for effective agent cooperation. To address the above limitations, we propose MegaAgent, a practical framework designed for autonomous cooperation in large-scale LLM Agent systems. MegaAgent leverages the autonomy of agents to dynamically generate agents based on task requirements, incorporating features such as automatically dividing tasks, systematic planning and monitoring of agent activities, and managing concurrent operations. In addition, MegaAgent is designed with a hierarchical structure and employs system-level parallelism to enhance performance and boost communication. We demonstrate the effectiveness of MegaAgent through Gobang game development, showing that it outperforms popular LLM-MA systems; and national policy simulation, demonstrating its high autonomy and potential to rapidly scale up to 590 agents while ensuring effective cooperation among them. Our results indicate that MegaAgent is the first autonomous large-scale LLM-MA system with no pre-defined SOPs, high effectiveness and scalability, paving the way for further research in this field. Our code is at https://anonymous.4open.science/r/MegaAgent-81F3.