AITopics

2410.21794

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

arXiv.org Artificial IntelligenceOct-28-2024

Multi-Agent Reinforcement Learning with Selective State-Space Models

Daniel, Jemma, de Kock, Ruan, Nessir, Louay Ben, Abramowitz, Sasha, Mahjoub, Omayma, Khlifi, Wiem, Formanek, Claude, Pretorius, Arnu

The Transformer model has demonstrated success across a wide range of domains, including in Multi-Agent Reinforcement Learning (MARL) where the Multi-Agent Transformer (MAT) has emerged as a leading algorithm in the field. However, a significant drawback of Transformer models is their quadratic computational complexity relative to input size, making them computationally expensive when scaling to larger inputs. This limitation restricts MAT's scalability in environments with many agents. Recently, State-Space Models (SSMs) have gained attention due to their computational efficiency, but their application in MARL remains unexplored. In this work, we investigate the use of Mamba, a recent SSM, in MARL and assess whether it can match the performance of MAT while providing significant improvements in efficiency. We introduce a modified version of MAT that incorporates standard and bi-directional Mamba blocks, as well as a novel "cross-attention" Mamba block. Extensive testing shows that our Multi-Agent Mamba (MAM) matches the performance of MAT across multiple standard multi-agent environments, while offering superior scalability to larger agent scenarios. This is significant for the MARL community, because it indicates that SSMs could replace Transformers without compromising performance, whilst also supporting more effective scaling to higher numbers of agents. Our project page is available at https://sites.google.com/view/multi-agent-mamba .

algorithm performance, machine learning, reinforcement learning, (17 more...)

2410.19382

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Zhong, Hai, Wang, Xun, Li, Zhuoran, Huang, Longbo

Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance. However, most existing research has focused on single-agent settings, with limited exploration of the multi-agent extension, i.e., Offline-to-Online Multi-Agent Reinforcement Learning (O2O MARL). In O2O MARL, two critical challenges become more prominent as the number of agents increases: (i) the risk of unlearning pre-trained Q-values due to distributional shifts during the transition from offline-to-online phases, and (ii) the difficulty of efficient exploration in the large joint state-action space. To tackle these challenges, we propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE). First, we introduce the Offline Value Function Memory (OVM) mechanism to compute target Q-values, preserving knowledge gained during offline training, ensuring smoother transitions, and enabling efficient fine-tuning. Second, we propose a decentralized Sequential Exploration (SE) strategy tailored for O2O MARL, which effectively utilizes the pre-trained offline policy for exploration, thereby significantly reducing the joint state-action space to be explored. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) demonstrate that OVMSE significantly outperforms existing baselines, achieving superior sample efficiency and overall performance.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2410.1945

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.34)
Education > Educational Setting > Online (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Giardini, Guilherme S. Y., Hardy, John F. II, da Cunha, Carlo R.

Evolving Neural Networks Reveal Emergent Collective Behavior from Minimal Agent Interactions

Understanding the mechanisms behind emergent behaviors in multi-agent systems is critical for advancing fields such as swarm robotics and artificial intelligence. In this study, we investigate how neural networks evolve to control agents' behavior in a dynamic environment, focusing on the relationship between the network's complexity and collective behavior patterns. By performing quantitative and qualitative analyses, we demonstrate that the degree of network non-linearity correlates with the complexity of emergent behaviors. Simpler behaviors, such as lane formation and laminar flow, are characterized by more linear network operations, while complex behaviors like swarming and flocking show highly non-linear neural processing. Moreover, specific environmental parameters, such as moderate noise, broader field of view, and lower agent density, promote the evolution of non-linear networks that drive richer, more intricate collective behaviors. These results highlight the importance of tuning evolutionary conditions to induce desired behaviors in multi-agent systems, offering new pathways for optimizing coordination in autonomous swarms. Our findings contribute to a deeper understanding of how neural mechanisms influence collective dynamics, with implications for the design of intelligent, self-organizing systems.

artificial intelligence, machine learning, neural network, (18 more...)

2410.19718

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.87)

Avraham, Inbal, Mirsky, Reuth

Shared Control with Black Box Agents using Oracle Queries

Shared control problems involve a robot learning to collaborate with a human. When learning a shared control policy, short communication between the agents can often significantly reduce running times and improve the system's accuracy. We extend the shared control problem to include the ability to directly query a cooperating agent. We consider two types of potential responses to a query, namely oracles: one that can provide the learner with the best action they should take, even when that action might be myopically wrong, and one with a bounded knowledge limited to its part of the system. Given this additional information channel, this work further presents three heuristics for choosing when to query: reinforcement learning-based, utility-based, and entropy-based. These heuristics aim to reduce a system's overall learning cost. Empirical results on two environments show the benefits of querying to learn a better control policy and the tradeoffs between the proposed heuristics.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2410.19612

Country: Europe > Sweden > Stockholm > Stockholm (0.04)

Genre:

Overview (0.68)
Research Report (0.64)

Industry: Transportation > Air (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Maruo, Ryota, Takeuchi, Koh, Kashima, Hisashi

Learning Neural Strategy-Proof Matching Mechanism from Examples

Designing effective two-sided matching mechanisms is a major problem in mechanism design, and the goodness of matching cannot always be formulated. The existing work addresses this issue by searching over a parameterized family of mechanisms with certain properties by learning to fit a human-crafted dataset containing examples of preference profiles and matching results. However, this approach does not consider a strategy-proof mechanism, implicitly assumes the number of agents to be a constant, and does not consider the public contextual information of the agents. In this paper, we propose a new parametric family of strategy-proof matching mechanisms by extending the serial dictatorship (SD). We develop a novel attention-based neural network called NeuralSD, which can learn a strategy-proof mechanism from a human-crafted dataset containing public contextual information. NeuralSD is constructed by tensor operations that make SD differentiable and learns a parameterized mechanism by estimating an order of SD from the contextual information. We conducted experiments to learn a strategy-proof matching from matching examples with different numbers of agents. We demonstrated that our method shows the superiority of learning with context-awareness over a baseline in terms of regression performance and other metrics.

artificial intelligence, machine learning, mechanism, (17 more...)

2410.19384

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
South America > Brazil (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.54)

Meulemans, Alexander, Kobayashi, Seijin, von Oswald, Johannes, Scherrer, Nino, Elmoznino, Eric, Richards, Blake, Lajoie, Guillaume, Arcas, Blaise Agüera y, Sacramento, João

Multi-agent cooperation through learning-aware policy gradients

arXiv.org Artificial IntelligenceOct-24-2024

Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning. How can we achieve cooperation among self-interested, independent learning agents? Promising recent work has shown that in certain tasks cooperation can be established between learning-aware agents who model the learning dynamics of each other. Here, we present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning, which takes into account that other agents are themselves learning through trial and error based on multiple noisy trials. We then leverage efficient sequence models to condition behavior on long observation histories that contain traces of the learning dynamics of other agents. Training long-context policies with our algorithm leads to cooperative behavior and high returns on standard social dilemmas, including a challenging environment where temporally-extended action coordination is required. Finally, we derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2410.18636

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Kotecha, Niki, Chanona, Antonio del Rio

Leveraging Graph Neural Networks and Multi-Agent Reinforcement Learning for Inventory Control in Supply Chains

arXiv.org Artificial IntelligenceOct-24-2024

Inventory control in modern supply chains has attracted significant attention due to the increasing number of disruptive shocks and the challenges posed by complex dynamics, uncertainties, and limited collaboration. Traditional methods, which often rely on static parameters, struggle to adapt to changing environments. This paper proposes a Multi-Agent Reinforcement Learning (MARL) framework with Graph Neural Networks (GNNs) for state representation to address these limitations. Our approach redefines the action space by parameterizing heuristic inventory control policies, making it adaptive as the parameters dynamically adjust based on system conditions. By leveraging the inherent graph structure of supply chains, our framework enables agents to learn the system's topology, and we employ a centralized learning, decentralized execution scheme that allows agents to learn collaboratively while overcoming information-sharing constraints. Additionally, we incorporate global mean pooling and regularization techniques to enhance performance. We test the capabilities of our proposed approach on four different supply chain configurations and conduct a sensitivity analysis. This work paves the way for utilizing MARL-GNN frameworks to improve inventory management in complex, decentralized supply chain environments.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2410.18631

Country:

North America > United States > New York (0.04)
Europe > United Kingdom (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Energy (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceOct-23-2024

IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems

Mao, Yihuan, Kang, Yipeng, Li, Peilun, Zhang, Ning, Xu, Wei, Zhang, Chongjie

As large language model (LLM) agents increasingly integrate into our infrastructure, their robust coordination and message synchronization become vital. The Byzantine Generals Problem (BGP) is a critical model for constructing resilient multi-agent systems (MAS) under adversarial attacks. It describes a scenario where malicious agents with unknown identities exist in the system-situations that, in our context, could result from LLM agents' hallucinations or external attacks. In BGP, the objective of the entire system is to reach a consensus on the action to be taken. Traditional BGP requires global consensus among all agents; however, in practical scenarios, global consensus is not always necessary and can even be inefficient. Therefore, there is a pressing need to explore a refined version of BGP that aligns with the local coordination patterns observed in MAS. We refer to this refined version as Imperfect BGP (IBGP) in our research, aiming to address this discrepancy. To tackle this issue, we propose a framework that leverages consensus protocols within general MAS settings, providing provable resilience against communication attacks and adaptability to changing environments, as validated by empirical results. Additionally, we present a case study in a sensor network environment to illustrate the practical application of our protocol.

agent, artificial intelligence, protocol, (17 more...)

2410.16237

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (0.66)
Government > Military (0.49)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

arXiv.org Artificial IntelligenceOct-23-2024

Scalable Offline Reinforcement Learning for Mean Field Games

Brunnbauer, Axel, Lemmel, Julian, Babaiee, Zahra, Neubauer, Sophie, Grosu, Radu

Reinforcement learning algorithms for mean-field games offer a scalable framework for optimizing policies in large populations of interacting agents. Existing methods often depend on online interactions or access to system dynamics, limiting their practicality in real-world scenarios where such interactions are infeasible or difficult to model. In this paper, we present Offline Munchausen Mirror Descent (Off-MMD), a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data. By leveraging iterative mirror descent and importance sampling techniques, Off-MMD estimates the mean-field distribution from static datasets without relying on simulation or environment dynamics. Additionally, we incorporate techniques from offline reinforcement learning to address common issues like Q-value overestimation, ensuring robust policy learning even with limited data coverage. Our algorithm scales to complex environments and demonstrates strong performance on benchmark tasks like crowd exploration or navigation, highlighting its applicability to real-world multi-agent systems where online experimentation is infeasible. We empirically demonstrate the robustness of Off-MMD to low-quality datasets and conduct experiments to investigate its sensitivity to hyperparameter choices.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2410.17898

Country:

Europe > Austria > Vienna (0.15)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)