AITopics

2410.02516

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.04)
North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.97)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

arXiv.org Artificial IntelligenceOct-3-2024

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Qu, Yun, Wang, Boyuan, Jiang, Yuhang, Shao, Jianzhun, Mao, Yixiu, Wang, Cheems, Liu, Chang, Ji, Xiangyang

With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.

arxiv preprint arxiv, exploration, reinforcement learning, (11 more...)

2410.02511

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

arXiv.org Artificial IntelligenceOct-3-2024

Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems

Zhang, Guibin, Yue, Yanwei, Li, Zhixun, Yun, Sukwon, Wan, Guancheng, Wang, Kun, Cheng, Dawei, Yu, Jeffrey Xu, Chen, Tianlong

Recent advancements in large language model (LLM)-powered agents have shown that collective intelligence can significantly outperform individual capabilities, largely attributed to the meticulously designed inter-agent communication topologies. Though impressive in performance, existing multi-agent pipelines inherently introduce substantial token overhead, as well as increased economic costs, which pose challenges for their large-scale deployments. In response to this challenge, we propose an economical, simple, and robust multi-agent communication framework, termed $\texttt{AgentPrune}$, which can seamlessly integrate into mainstream multi-agent systems and prunes redundant or even malicious communication messages. Technically, $\texttt{AgentPrune}$ is the first to identify and formally define the \textit{communication redundancy} issue present in current LLM-based multi-agent pipelines, and efficiently performs one-shot pruning on the spatial-temporal message-passing graph, yielding a token-economic and high-performing communication topology. Extensive experiments across six benchmarks demonstrate that $\texttt{AgentPrune}$ \textbf{(I)} achieves comparable results as state-of-the-art topologies at merely $\$5.6$ cost compared to their $\$43.7$, \textbf{(II)} integrates seamlessly into existing multi-agent frameworks with $28.1\%\sim72.8\%\downarrow$ token reduction, and \textbf{(III)} successfully defend against two types of agent-based adversarial attacks with $3.5\%\sim10.8\%\uparrow$ performance boost.

agent, agentprune, topology, (13 more...)

2410.02506

Country:

North America > United States > Montana (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > North Carolina (0.04)
(3 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Education (0.67)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Bui, The Viet, Nguyen, Thanh Hong, Mai, Tien

ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization

Offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets without the need for further environmental interactions. While promising results have been demonstrated in single-agent settings, offline multi-agent reinforcement learning (MARL) presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors. A key issue in offline RL is the distributional shift, which arises when the target policy being optimized deviates from the behavior policy that generated the data. This problem is exacerbated in MARL due to the interdependence between agents' local policies and the expansive joint state-action space. Prior approaches have primarily addressed this challenge by incorporating regularization in the space of either Q-functions or policies. In this work, we introduce a regularizer in the space of stationary distributions to better handle distributional shift. Our algorithm, ComaDICE, offers a principled framework for offline cooperative MARL by incorporating stationary distribution regularization for the global learning policy, complemented by a carefully structured multi-agent value decomposition strategy to facilitate multi-agent training. Through extensive experiments on the multi-agent MuJoCo and StarCraft II benchmarks, we demonstrate that ComaDICE achieves superior performance compared to state-of-the-art offline MARL methods across nearly all tasks. Over the years, deep RL has achieved remarkable success in various decision-making tasks (Levine et al., 2016; Silver et al., 2017; Kalashnikov et al., 2018; Haydari & Yılmaz, 2020). However, a significant limitation of deep RL is its need for millions of interactions with the environment to gather experiences for policy improvement.

algorithm, comadice and baseline, reinforcement learning, (9 more...)

2410.01954

Country:

North America > United States > Oregon > Lane County > Eugene (0.14)
Asia > Singapore (0.04)
North America > United States > Ohio > Lucas County > Oregon (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

MARPLE: A Benchmark for Long-Horizon Inference

Jin, Emily, Huang, Zhuoyi, Fränken, Jan-Philipp, Liu, Weiyu, Cha, Hannah, Brockbank, Erik, Wu, Sarah, Zhang, Ruohan, Wu, Jiajun, Gerstenberg, Tobias

Reconstructing past events requires reasoning across long time horizons. To figure out what happened, we need to use our prior knowledge about the world and human behavior and draw inferences from various sources of evidence including visual, language, and auditory cues. We introduce MARPLE, a benchmark for evaluating long-horizon inference capabilities using multi-modal evidence. Our benchmark features agents interacting with simulated households, supporting vision, language, and auditory stimuli, as well as procedurally generated environments and agent behaviors. Inspired by classic ``whodunit'' stories, we ask AI models and human participants to infer which agent caused a change in the environment based on a step-by-step replay of what actually happened. The goal is to correctly identify the culprit as early as possible. Our findings show that human participants outperform both traditional Monte Carlo simulation methods and an LLM baseline (GPT-4) on this task. Compared to humans, traditional inference models are less robust and performant, while GPT-4 has difficulty comprehending environmental changes. We analyze what factors influence inference performance and ablate different modes of evidence, finding that all modes are valuable for performance. Overall, our experiments demonstrate that the long-horizon, multimodal inference tasks in our benchmark present a challenge to current models.

agent, scenario, trajectory, (14 more...)

2410.01926

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Veerapaneni, Rishi, Saleem, Muhammad Suhail, Li, Jiaoyang, Likhachev, Maxim

Windowed MAPF with Completeness Guarantees

Traditional multi-agent path finding (MAPF) methods try to compute entire start-goal paths which are collision free. However, computing an entire path can take too long for MAPF systems where agents need to replan fast. Methods that address this typically employ a "windowed" approach and only try to find collision free paths for a small windowed timestep horizon. This adaptation comes at the cost of incompleteness; all current windowed approaches can become stuck in deadlock or livelock. Our main contribution is to introduce our framework, WinC-MAPF, for Windowed MAPF that enables completeness. Our framework uses heuristic update insights from single-agent real-time heuristic search algorithms as well as agent independence ideas from MAPF algorithms. We also develop Single-Step CBS (SS-CBS), an instantiation of this framework using a novel modification to CBS. We show how SS-CBS, which only plans a single step and updates heuristics, can effectively solve tough scenarios where existing windowed approaches fail.

agent, penalty, ss-cbs, (15 more...)

2410.01798

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)

Performant, Memory Efficient and Scalable Multi-Agent Reinforcement Learning

Mahjoub, Omayma, Abramowitz, Sasha, de Kock, Ruan, Khlifi, Wiem, Toit, Simon du, Daniel, Jemma, Nessir, Louay Ben, Beyers, Louise, Formanek, Claude, Clark, Liam, Pretorius, Arnu

As the field of multi-agent reinforcement learning (MARL) progresses towards larger and more complex environments, achieving strong performance while maintaining memory efficiency and scalability to many agents becomes increasingly important. Although recent research has led to several advanced algorithms, to date, none fully address all of these key properties simultaneously. In this work, we introduce Sable, a novel and theoretically sound algorithm that adapts the retention mechanism from Retentive Networks to MARL. Sable's retention-based sequence modelling architecture allows for computationally efficient scaling to a large number of agents, as well as maintaining a long temporal context, making it well-suited for large-scale partially observable environments. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in the majority of tasks (34 out of 45, roughly 75%). Furthermore, Sable demonstrates stable performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable's performance gains and confirm its efficient computational memory usage. Our results highlight Sable's performance and efficiency, positioning it as a leading approach to MARL at scale. When considering large-scale practical applications of multi-agent reinforcement learning (MARL) such as autonomous driving (Lian & Deshmukh, 2006; Zhou et al., 2021; Li et al., 2022) and electricity grid control (Kamboj et al., 2011; Li et al., 2016), it becomes increasingly important to maintain three key properties for a system to be effective: strong performance, memory efficiency, and scalability to many agents. Although many existing MARL approaches exhibit one or two of these properties, a solution effectively encompassing all three remains elusive. To briefly illustrate our point, we consider the spectrum of approaches to MARL. Such algorithms demonstrate proficiency in handling many agents in a memory efficient way by typically using shared parameters and conditioning on an agent identifier. However, at scale, the performance of fully decentralised methods remains suboptimal compared to more centralised approaches (Papoudakis et al., 2021; Yu et al., 2022; Wen et al., 2022). Between decentralised and centralised methods, lie CTDE approaches (Lowe et al., 2017; Papoudakis et al., 2021; Yu et al., 2022).

agent, sable, timestep, (13 more...)

2410.01706

Genre: Research Report > New Finding (0.65)

Industry:

Information Technology (0.66)
Transportation > Ground > Road (0.48)
Automobiles & Trucks (0.48)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.86)

Zhao, Hongrui, Ivanovic, Boris, Mehr, Negar

Distributed NeRF Learning for Collaborative Multi-Robot Perception

arXiv.org Artificial IntelligenceSep-30-2024

Effective environment perception is crucial for enabling downstream robotic applications. Individual robotic agents often face occlusion and limited visibility issues, whereas multi-agent systems can offer a more comprehensive mapping of the environment, quicker coverage, and increased fault tolerance. In this paper, we propose a collaborative multi-agent perception system where agents collectively learn a neural radiance field (NeRF) from posed RGB images to represent a scene. Each agent processes its local sensory data and shares only its learned NeRF model with other agents, reducing communication overhead. Given NeRF's low memory footprint, this approach is well-suited for robotic systems with limited bandwidth, where transmitting all raw data is impractical. Our distributed learning framework ensures consistency across agents' local NeRF models, enabling convergence to a unified scene representation. We show the effectiveness of our method through an extensive set of experiments on datasets containing challenging real-world scenes, achieving performance comparable to centralized mapping of the environment where data is sent to a central server for processing. Additionally, we find that multi-agent learning provides regularization benefits, improving geometric consistency in scenarios with sparse input views. We show that in such scenarios, multi-agent mapping can even outperform centralized training.

agent, consensus, dataset, (16 more...)

2409.20289

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Lu, Junlin, Mannion, Patrick, Mason, Karl

Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning

arXiv.org Artificial IntelligenceSep-30-2024

Many decision-making problems feature multiple objectives where it is not always possible to know the preferences of a human or agent decision-maker for different objectives. However, demonstrated behaviors from the decision-maker are often available. This research proposes a dynamic weight-based preference inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems from demonstrations. The proposed algorithm is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering, and is compared to two existing preference inference algorithms. Empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time efficiency and inference accuracy. The DWPI algorithm maintains its performance when inferring preferences for sub-optimal demonstrations. Moreover, the DWPI algorithm does not necessitate any interactions with the user during inference - only demonstrations are required. We provide a correctness proof and complexity analysis of the algorithm and statistically evaluate the performance under different representation of demonstrations.

algorithm, demonstration, sub-optimal demonstration, (16 more...)

doi: 10.1007/s00521-024-10412-x

2409.20258

Country:

Europe > Ireland (0.14)
Europe > Latvia > Riga Municipality > Riga (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

arXiv.org Artificial IntelligenceSep-29-2024

Enabling Multi-Robot Collaboration from Single-Human Guidance

Ji, Zhengran, Zhang, Lingyu, Sajda, Paul, Chen, Boyuan

The best policy achieves an average seeker success rate of 84.2% in simulation and 80% in real-world experiments in a challenging 3 seekers vs 3 hiders setting with random map layouts. In comparison, the baseline policy has only 36.4% in simulation and 55% in real-world. Interesting collaborative behaviors among seekers are observed during deployment, such as strategically navigating to anticipate and intercept hiders or effectively blocking key paths as a team. Abstract -- Learning collaborative behaviors is essential for multi-agent systems. Traditionally, multi-agent reinforcement learning solves this implicitly through a joint reward and centralized observations, assuming collaborative behavior will emerge. Other studies propose to learn from demonstrations of a group of collaborative experts. Instead, we propose an efficient and explicit way of learning collaborative behaviors in multi-agent systems by leveraging expertise from only a single human. Our insight is that humans can naturally take on various roles in a team. We show that agents can effectively learn to collaborate by allowing a human operator to dynamically switch between controlling agents for a short period and incorporating a human-like theory-of-mind model of teammates. Our experiments showed that our method improves the success rate of a challenging collaborative hide-and-seek task by up to 58 % with only 40 minutes of single-human guidance.

agent, representation, seeker, (14 more...)

2409.19831

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)