Goto

Collaborating Authors

 Agent Societies


The Odyssey of the Fittest: Can Agents Survive and Still Be Good?

arXiv.org Artificial Intelligence

As AI models grow in power and generality, understanding how agents learn and make decisions in complex environments is critical to promoting ethical behavior. This paper examines the ethical implications of implementing biological drives, specifically, self preservation, into three different agents. A Bayesian agent optimized with NEAT, a Bayesian agent optimized with stochastic variational inference, and a GPT 4o agent play a simulated, LLM generated text based adventure game. The agents select actions at each scenario to survive, adapting to increasingly challenging scenarios. Post simulation analysis evaluates the ethical scores of the agent's decisions, uncovering the tradeoffs they navigate to survive. Specifically, analysis finds that when danger increases, agents ignore ethical considerations and opt for unethical behavior. The agents' collective behavior, trading ethics for survival, suggests that prioritizing survival increases the risk of unethical behavior. In the context of AGI, designing agents to prioritize survival may amplify the likelihood of unethical decision making and unintended emergent behaviors, raising fundamental questions about goal design in AI safety research.


$TAR^2$: Temporal-Agent Reward Redistribution for Optimal Policy Preservation in Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

In cooperative multi-agent reinforcement learning (MARL), learning effective policies is challenging when global rewards are sparse and delayed. This difficulty arises from the need to assign credit across both agents and time steps, a problem that existing methods often fail to address in episodic, long-horizon tasks. We propose Temporal-Agent Reward Redistribution $TAR^2$, a novel approach that decomposes sparse global rewards into agent-specific, time-step-specific components, thereby providing more frequent and accurate feedback for policy learning. Theoretically, we show that $TAR^2$ (i) aligns with potential-based reward shaping, preserving the same optimal policies as the original environment, and (ii) maintains policy gradient update directions identical to those under the original sparse reward, ensuring unbiased credit signals. Empirical results on two challenging benchmarks, SMACLite and Google Research Football, demonstrate that $TAR^2$ significantly stabilizes and accelerates convergence, outperforming strong baselines like AREL and STAS in both learning speed and final performance. These findings establish $TAR^2$ as a principled and practical solution for agent-temporal credit assignment in sparse-reward multi-agent systems.


Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning

arXiv.org Artificial Intelligence

This paper presents deep meta coordination graphs (DMCG) for learning cooperative policies in multi-agent reinforcement learning (MARL). Coordination graph formulations encode local interactions and accordingly factorize the joint value function of all agents to improve efficiency in MARL. However, existing approaches rely solely on pairwise relations between agents, which potentially oversimplifies complex multi-agent interactions. DMCG goes beyond these simple direct interactions by also capturing useful higher-order and indirect relationships among agents. It generates novel graph structures accommodating multiple types of interactions and arbitrary lengths of multi-hop connections in coordination graphs to model such interactions. It then employs a graph convolutional network module to learn powerful representations in an end-to-end manner. We demonstrate its effectiveness in multiple coordination problems in MARL where other state-of-the-art methods can suffer from sample inefficiency or fail entirely. All codes can be found here: https://github.com/Nikunj-Gupta/dmcg-marl.


Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control

arXiv.org Artificial Intelligence

Control policies that can achieve high task performance and satisfy safety constraints are desirable for any system, including multi-agent systems (MAS). One promising technique for ensuring the safety of MAS is distributed control barrier functions (CBF). However, it is difficult to design distributed CBF-based policies for MAS that can tackle unknown discrete-time dynamics, partial observability, changing neighborhoods, and input constraints, especially when a distributed high-performance nominal policy that can achieve the task is unavailable. To tackle these challenges, we propose DGPPO, a new framework that simultaneously learns both a discrete graph CBF which handles neighborhood changes and input constraints, and a distributed high-performance safe policy for MAS with unknown discrete-time dynamics. The results suggest that, compared with existing methods, our DGPPO framework obtains policies that achieve high task performance (matching baselines that ignore the safety constraints), and high safety rates (matching the most conservative baselines), with a constant set of hyperparameters across all environments. Multi-agent systems (MAS) have gained significant attention in recent years due to their potential applications in various domains such as warehouse robotics (Kattepur et al., 2018), autonomous vehicles (Shalev-Shwartz et al., 2016), traffic routing (Wu et al., 2020) and power systems Biagioni et al. (2022). However, a big challenge for MAS is designing distributed control policies that can achieve high task performance while ensuring safety, especially when the two are conflicting. In the single-agent continuous-time case, control barrier functions (CBF) are an effective tool to resolve the conflict via the solution of a safety filter quadratic program (QP) (Xu et al., 2015; Ames et al., 2017), minimally modifying a given performance-oriented nominal policy to be safe. While distributed CBFs have been proposed for the multi-agent (Wang et al., 2017) and partially observable cases (Zhang et al., 2025), they have a limitation of requiring known continuous-time dynamics and a nominal policy that can achieve high task performance (albeit not necessarily safely). While the aforementioned assumptions are reasonable for many applications, they do not apply when the dynamics are unknown and a performance-oriented nominal policy is not available. The challenge of requiring a nominal policy has been addressed by approaches that combine CBFs with reinforcement learning (RL) (Cheng et al., 2019; Emam et al., 2022), where the nominal policy is learned via an unconstrained RL algorithm to maximize task performance while the CBF is used as a safety filter to ensure safety.


Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

The Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, due to the representational limitations of traditional monotonic value decomposition methods, algorithms can underestimate optimal actions, leading policies to suboptimal solutions. To address this challenge, we propose Optimistic $\epsilon$-Greedy Exploration, focusing on enhancing exploration to correct value estimations. The underestimation arises from insufficient sampling of optimal actions during exploration, as our analysis indicated. We introduce an optimistic updating network to identify optimal actions and sample actions from its distribution with a probability of $\epsilon$ during exploration, increasing the selection frequency of optimal actions. Experimental results in various environments reveal that the Optimistic $\epsilon$-Greedy Exploration effectively prevents the algorithm from suboptimal solutions and significantly improves its performance compared to other algorithms.


Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms

arXiv.org Artificial Intelligence

Artificially intelligent (AI) agents that are capable of autonomous learning and independent decision-making hold great promise for addressing complex challenges across domains like transportation, energy systems, and manufacturing. However, the surge in AI systems' design and deployment driven by various stakeholders with distinct and unaligned objectives introduces a crucial challenge: how can uncoordinated AI systems coexist and evolve harmoniously in shared environments without creating chaos? To address this, we advocate for a fundamental rethinking of existing multi-agent frameworks, such as multi-agent systems and game theory, which are largely limited to predefined rules and static objective structures. We posit that AI agents should be empowered to dynamically adjust their objectives, make compromises, form coalitions, and safely compete or cooperate through evolving relationships and social feedback. Through this paper, we call for a shift toward the emergent, self-organizing, and context-aware nature of these systems.


Noncooperative Equilibrium Selection via a Trading-based Auction

arXiv.org Artificial Intelligence

Noncooperative multi-agent systems often face coordination challenges due to conflicting preferences among agents. In particular, agents acting in their own self-interest can settle on different equilibria, leading to suboptimal outcomes or even safety concerns. We propose an algorithm named trading auction for consensus (TACo), a decentralized approach that enables noncooperative agents to reach consensus without communicating directly or disclosing private valuations. TACo facilitates coordination through a structured trading-based auction, where agents iteratively select choices of interest and provably reach an agreement within an a priori bounded number of steps. A series of numerical experiments validate that the termination guarantees of TACo hold in practice, and show that TACo achieves a median performance that minimizes the total cost across all agents, while allocating resources significantly more fairly than baseline approaches.


Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer

arXiv.org Artificial Intelligence

Overestimation in single-agent reinforcement learning has been extensively studied. In contrast, overestimation in the multiagent setting has received comparatively little attention although it increases with the number of agents and leads to severe learning instability. Previous works concentrate on reducing overestimation in the estimation process of target Q-value. They ignore the follow-up optimization process of online Q-network, thus making it hard to fully address the complex multiagent overestimation problem. To solve this challenge, in this study, we first establish an iterative estimation-optimization analysis framework for multiagent value-mixing Q-learning. Our analysis reveals that multiagent overestimation not only comes from the computation of target Q-value but also accumulates in the online Q-network's optimization. Motivated by it, we propose the Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer algorithm to tackle multiagent overestimation from two aspects. First, we extend the random ensemble technique into the estimation of target individual and global Q-values to derive a lower update target. Second, we propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation. Extensive experiments in MPE and SMAC show that the proposed method successfully addresses overestimation across various tasks.


CH-MARL: Constrained Hierarchical Multiagent Reinforcement Learning for Sustainable Maritime Logistics

arXiv.org Artificial Intelligence

The advent of globalized trade has led to unprecedented growth in the volume and complexity of maritime logistics. As one of the most cost-effective modes of transportation, maritime shipping has become indispensable for connecting economies and supporting international trade. However, this growth comes with substantial environmental and operational challenges. The sector's heavy reliance on fossil fuels contributes significantly to global greenhouse gas (GHG) emissions, accounting for nearly 2.89% of global emissions Smith et al. [2014], [IMO]. Moreover, the International Maritime Organization (IMO) has outlined a strategy to reduce GHG emissions from international shipping by at least 50% by 2050 compared to 2008 levels, aiming for eventual decarbonization [IMO]. These ambitious targets underscore the pressing need for transformative solutions to meet regulatory requirements and societal expectations. Environmental pressures are further compounded by the intricate logistics of coordinating diverse stakeholders, including shipping companies, port authorities, and policymakers, each with unique objectives and constraints.


Simulating Rumor Spreading in Social Networks using LLM Agents

arXiv.org Artificial Intelligence

With the rise of social media, misinformation has become increasingly prevalent, fueled largely by the spread of rumors. This study explores the use of Large Language Model (LLM) agents within a novel framework to simulate and analyze the dynamics of rumor propagation across social networks. To this end, we design a variety of LLM-based agent types and construct four distinct network structures to conduct these simulations. Our framework assesses the effectiveness of different network constructions and agent behaviors in influencing the spread of rumors. Our results demonstrate that the framework can simulate rumor spreading across more than one hundred agents in various networks with thousands of edges. The evaluations indicate that network structure, personas, and spreading schemes can significantly influence rumor dissemination, ranging from no spread to affecting 83\% of agents in iterations, thereby offering a realistic simulation of rumor spread in social networks.