Goto

Collaborating Authors

 Agent Societies


MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration

arXiv.org Artificial Intelligence

The rapid advancement of Large Language Models (LLMs) (Achiam et al., 2023; Touvron et al., 2023b) has ushered in a new era of artificial intelligence, enabling the creation of sophisticated AI agents capable of tackling complex tasks across various domains (Nakajima, 2023; Torantulino, 2023). As these AI systems become more intricate, there is a growing need for effective collaboration mechanisms that allow multiple agents to work together. This collaborative approach, known as Multi-Agent Systems (MAS) (Han et al., 2024), has shown great promise in addressing challenges that are too complex or diverse for single-agent systems (Hong et al., 2024; Liu et al., 2023). While existing MAS implementations have shown promising results, they often rely on predefined roles (Li et al., 2023), centralized coordination (Guo et al., 2024; Chen et al., 2024), or rigid organizational structures (Wang et al., 2024b; Hong et al., 2024). These approaches limit cooperative resilience within MAS (Chacon-Chamorro et al., 2024), which focuses on robustness and adaptability in dynamic, unpredictable environments. Figure 1 presents two examples to illustrate the real-world challenges with details elaborated below: Example 1.1 (Domain shift). Domain shift refers to a change in the characteristics or requirements of a task as it progresses through different phases or contexts, presenting new challenges and requiring different skill sets. For instance, a scientific research project could begin with literature review, move to experiment design, and conclude with result analysis and paper writing. These transitions demand a flexible and adaptive multi-agent system that can seamlessly adjust its collaborative strategies and agent roles as the task progresses.


Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation

arXiv.org Artificial Intelligence

We conducted experiments comparing the effectiveness of using simpler versus more complex dataset in different stages of the post-training process to better understand the optimal post-training strategy for large language models. Here we conduct comparison experiment on two kinds of instructions: simple instructions and specialized instructions, denoted as type 1 and type 2. As showen in Table 10, we observe that performing SFT on simpler instructions helps the model to establish a foundational level of instruction-following ability. This is reflected in moderate performance on AlpacaEval 2 (LC 16.25%, WR 17.62%) but lower performance on the more challenging Arena-Hard benchmark (WR 10.7%). When the model is fine-tuned on more specialized and complex data, there is a marginal improvement (LC 14.70%, WR 16.01%, Arena-Hard WR 14.7%), and the significant performance gains are achieved when DPO is applied after SFT. For example, SFT followed by DPO with complex, specialized instructions yields substantial improvements (LC 21.64%, WR 30.06%,


A Survey of Multi-Agent Deep Reinforcement Learning with Communication

arXiv.org Artificial Intelligence

Communication is an effective mechanism for coordinating the behaviors of multiple agents, broadening their views of the environment, and to support their collaborations. In the field of multi-agent deep reinforcement learning (MADRL), agents can improve the overall learning performance and achieve their objectives by communication. Agents can communicate various types of messages, either to all agents or to specific agent groups, or conditioned on specific constraints. With the growing body of research work in MADRL with communication (Comm-MADRL), there is a lack of a systematic and structural approach to distinguish and classify existing Comm-MADRL approaches. In this paper, we survey recent works in the Comm-MADRL field and consider various aspects of communication that can play a role in designing and developing multi-agent reinforcement learning systems. With these aspects in mind, we propose 9 dimensions along which Comm-MADRL approaches can be analyzed, developed, and compared. By projecting existing works into the multi-dimensional space, we discover interesting trends. We also propose some novel directions for designing future Comm-MADRL systems through exploring possible combinations of the dimensions.


Large Language Model-driven Multi-Agent Simulation for News Diffusion Under Different Network Structures

arXiv.org Artificial Intelligence

The proliferation of fake news in the digital age has raised critical concerns, particularly regarding its impact on societal trust and democratic processes. Diverging from conventional agent-based simulation approaches, this work introduces an innovative approach by employing a large language model (LLM)-driven multi-agent simulation to replicate complex interactions within information ecosystems. We investigate key factors that facilitate news propagation, such as agent personalities and network structures, while also evaluating strategies to combat misinformation. Through simulations across varying network structures, we demonstrate the potential of LLM-based agents in modeling the dynamics of misinformation spread, validating the influence of agent traits on the diffusion process. Our findings emphasize the advantages of LLM-based simulations over traditional techniques, as they uncover underlying causes of information spread -- such as agents promoting discussions -- beyond the predefined rules typically employed in existing agent-based models. Additionally, we evaluate three countermeasure strategies, discovering that brute-force blocking influential agents in the network or announcing news accuracy can effectively mitigate misinformation. However, their effectiveness is influenced by the network structure, highlighting the importance of considering network structure in the development of future misinformation countermeasures.


Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making

arXiv.org Artificial Intelligence

We address the challenge of explaining counterfactual outcomes in multi-agent Markov decision processes. In particular, we aim to explain the total counterfactual effect of an agent's action on the outcome of a realized scenario through its influence on the environment dynamics and the agents' behavior. To achieve this, we introduce a novel causal explanation formula that decomposes the counterfactual effect by attributing to each agent and state variable a score reflecting their respective contributions to the effect. First, we show that the total counterfactual effect of an agent's action can be decomposed into two components: one measuring the effect that propagates through all subsequent agents' actions and another related to the effect that propagates through the state transitions. Building on recent advancements in causal contribution analysis, we further decompose these two effects as follows. For the former, we consider agent-specific effects - a causal concept that quantifies the counterfactual effect of an agent's action that propagates through a subset of agents. Based on this notion, we use Shapley value to attribute the effect to individual agents. For the latter, we consider the concept of structure-preserving interventions and attribute the effect to state variables based on their "intrinsic" contributions. Through extensive experimentation, we demonstrate the interpretability of our decomposition approach in a Gridworld environment with LLM-assisted agents and a sepsis management simulator. Applying counterfactual reasoning to retrospectively analyze the impact of different actions in decision making scenarios is fundamental for accountability. To achieve such objectives, many studies often rely on the notion of total counterfactual effects, which quantifies the extent to which an alternative action would have affected the outcome of a realized scenario. In multi-agent sequential decision making, an agent's action typically affects the outcome indirectly. To illustrate this, consider the problem of AI-assisted decision making in healthcare (Lynn, 2019), where a clinician and their AI assistant treat a patient over a period of time.


Corridor Generating Algorithm for Multi-Agent Pathfinding

arXiv.org Artificial Intelligence

In this paper, we solve the classical Multi-agent Pathfinding (MAPF) problem. Existing approaches struggle to solve dense MAPF instances. In this paper, we propose a Corridor Generating Algorithm for MAPF, namely CGA-MAPF. In CGA-MAPF, the agents build \emph{corridors}, a set of connected vertices, from current locations towards agents' goals and evacuate other agents out of the corridors to avoid collisions and deadlocks. The proposed algorithm has a reachability property, i.e. every agent is guaranteed to reach its goal location at some point. In the experimental section, we demonstrate that CGA-MAPF outperforms baseline algorithms in terms of success rate across diverse MAPF benchmark grids, achieving state-of-the-art performance.


MFC-EQ: Mean-Field Control with Envelope Q-Learning for Moving Decentralized Agents in Formation

arXiv.org Artificial Intelligence

We study a decentralized version of Moving Agents in Formation (MAiF), a variant of Multi-Agent Path Finding aiming to plan collision-free paths for multiple agents with the dual objectives of reaching their goals quickly while maintaining a desired formation. The agents must balance these objectives under conditions of partial observation and limited communication. The formation maintenance depends on the joint state of all agents, whose dimensionality increases exponentially with the number of agents, rendering the learning process intractable. Additionally, learning a single policy that can accommodate different linear preferences for these two objectives presents a significant challenge. In this paper, we propose Mean-Field Control with Envelop $Q$-learning (MFC-EQ), a scalable and adaptable learning framework for this bi-objective multi-agent problem. We approximate the dynamics of all agents using mean-field theory while learning a universal preference-agnostic policy through envelop $Q$-learning. Our empirical evaluation of MFC-EQ across numerous instances shows that it outperforms state-of-the-art centralized MAiF baselines. Furthermore, MFC-EQ effectively handles more complex scenarios where the desired formation changes dynamically -- a challenge that existing MAiF planners cannot address.


Compositional Shielding and Reinforcement Learning for Multi-Agent Systems

arXiv.org Artificial Intelligence

Deep reinforcement learning has emerged as a powerful tool for obtaining high-performance policies. However, the safety of these policies has been a long-standing issue. One promising paradigm to guarantee safety is a shield, which shields a policy from making unsafe actions. However, computing a shield scales exponentially in the number of state variables. This is a particular concern in multi-agent systems with many agents. In this work, we propose a novel approach for multi-agent shielding. We address scalability by computing individual shields for each agent. The challenge is that typical safety specifications are global properties, but the shields of individual agents only ensure local properties. Our key to overcome this challenge is to apply assume-guarantee reasoning. Specifically, we present a sound proof rule that decomposes a (global, complex) safety specification into (local, simple) obligations for the shields of the individual agents. Moreover, we show that applying the shields during reinforcement learning significantly improves the quality of the policies obtained for a given training budget. We demonstrate the effectiveness and scalability of our multi-agent shielding framework in two case studies, reducing the computation time from hours to seconds and achieving fast learning convergence.


The $s$-Energy and Its Applications

arXiv.org Artificial Intelligence

Averaging dynamics drives countless processes in physics, biology, engineering, and the social sciences. In recent years, the $s$-energy has emerged as a useful tool for bounding the convergence rates of time-varying averaging systems. We derive new bounds on the $s$-energy, which we use to resolve a number of open questions in the areas of bird flocking, opinion dynamics, and distributed motion coordination. We also use our results to provide a theoretical validation for the idea of the "Overton Window" as an attracting manifold of viable group opinions. Our new bounds on the $s$-energy highlight its dependency on the connectivity of the underlying networks. In this vein, we use the $s$-energy to explain the exponential gap in the convergence rates of stationary and time-varying consensus systems.


CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device

arXiv.org Artificial Intelligence

While server-side Large Language Models (LLMs) demonstrate proficiency in function calling and complex reasoning, deploying Small Language Models (SLMs) directly on devices brings opportunities to improve latency and privacy but also introduces unique challenges for accuracy and memory. We introduce CAMPHOR, an innovative on-device SLM multi-agent framework designed to handle multiple user inputs and reason over personal context locally, ensuring privacy is maintained. CAMPHOR employs a hierarchical architecture where a high-order reasoning agent decomposes complex tasks and coordinates expert agents responsible for personal context retrieval, tool interaction, and dynamic plan generation. By implementing parameter sharing across agents and leveraging prompt compression, we significantly reduce model size, latency, and memory usage. To validate our approach, we present a novel dataset capturing multi-agent task trajectories centered on personalized mobile assistant use-cases. Our experiments reveal that fine-tuned SLM agents not only surpass closed-source LLMs in task completion F1 by~35\% but also eliminate the need for server-device communication, all while enhancing privacy.