AITopics | seac

SharedExperienceActor-Criticfor Multi-AgentReinforcementLearning

Neural Information Processing SystemsFeb-9-2026, 01:14:27 GMT

Exploration in multi-agent reinforcement learning is a challenging problem, especially inenvironments with sparse rewards.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 05:14:03 GMT

Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called shared Experience Actor-Critic(SEAC), applies experience sharing in an actor-critic framework by combining the gradients of different agents. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms several baselines and state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.

experience actor-critic, multi-agent reinforcement learning, shared experience actor-critic, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

7967cc8e3ab559e68cc944c44b1cf3e8-Supplemental.pdf

Neural Information Processing SystemsNov-14-2025, 07:27:16 GMT

Agents need to put down their previously delivered shelf to be able to pick up a new shelf. Figure 9: Four variations of level based foraging used in this work. Agents can navigate in the environment and attempt to collect food placed next to them. Note that the final variant, Figure 9d, is a fully-cooperative environment. Table 2 contains the hyperparameters used in the experiments.

agent, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)

Add feedback

7967cc8e3ab559e68cc944c44b1cf3e8-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 07:38:00 GMT

agent, agent 1, seql and iql, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)

Add feedback

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

Neural Information Processing SystemsOct-3-2025, 07:37:52 GMT

Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents.

agent, learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)

Add feedback

7967cc8e3ab559e68cc944c44b1cf3e8-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 07:37:41 GMT

agent, maddpg and qmix, supplementary material, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 14:36:45 GMT

Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called shared Experience Actor-Critic(SEAC), applies experience sharing in an actor-critic framework by combining the gradients of different agents. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms several baselines and state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.

experience actor-critic, multi-agent reinforcement learning, shared experience actor-critic, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deployable Reinforcement Learning with Variable Control Rate

Wang, Dong, Beltrame, Giovanni

arXiv.org Artificial IntelligenceJan-17-2024

Deploying controllers trained with Reinforcement Learning (RL) on real robots can be challenging: RL relies on agents' policies being modeled as Markov Decision Processes (MDPs), which assume an inherently discrete passage of time. The use of MDPs results in that nearly all RL-based control systems employ a fixed-rate control strategy with a period (or time step) typically chosen based on the developer's experience or specific characteristics of the application environment. Unfortunately, the system should be controlled at the highest, worst-case frequency to ensure stability, which can demand significant computational and energy resources and hinder the deployability of the controller on onboard hardware. Adhering to the principles of reactive programming, we surmise that applying control actions only when necessary enables the use of simpler hardware and helps reduce energy consumption. We challenge the fixed frequency assumption by proposing a variant of RL with variable control rate. In this approach, the policy decides the action the agent should take as well as the duration of the time step associated with that action. In our new setting, we expand Soft Actor-Critic (SAC) to compute the optimal policy with a variable control rate, introducing the Soft Elastic Actor-Critic (SEAC) algorithm. We show the efficacy of SEAC through a proof-of-concept simulation driving an agent with Newtonian kinematics. Our experiments show higher average returns, shorter task completion times, and reduced computational resources when compared to fixed rate policies.

agent, algorithm, reinforcement, (13 more...)

arXiv.org Artificial Intelligence

2401.09286

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

The Multi-Agent Pickup and Delivery Problem: MAPF, MARL and Its Warehouse Applications

Lau, Tim Tsz-Kit, Sengupta, Biswa

arXiv.org Machine LearningMar-14-2022

We study two state-of-the-art solutions to the multi-agent pickup and delivery (MAPD) problem based on different principles -- multi-agent path-finding (MAPF) and multi-agent reinforcement learning (MARL). Specifically, a recent MAPF algorithm called conflict-based search (CBS) and a current MARL algorithm called shared experience actor-critic (SEAC) are studied. While the performance of these algorithms is measured using quite different metrics in their separate lines of work, we aim to benchmark these two methods comprehensively in a simulated warehouse automation environment.

agent, algorithm, proceedings, (13 more...)

arXiv.org Machine Learning

2203.07092

Country: