AITopics | Reinforcement Learning

Episodic Multi agent Reinforcement Learning with Curiosity driven Exploration

Neural Information Processing SystemsApr-25-2026, 00:49:26 GMT

Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems. In this paper, we introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We leverage an insight of popular factorized MARL algorithms that the "induced" individual Q-values, i.e., the individual utility functions used for local execution, are the embeddings of local actionobservation histories, and can capture the interaction between agents due to reward backpropagation during centralized training. Therefore, we use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training. As the dynamics of an agent's individual Q-value function captures the novelty of states and the influence from other agents, our intrinsic reward can induce coordinated exploration to new or promising states. We illustrate the advantages of our method by didactic examples, and demonstrate its significant outperformance over state-of-the-art MARL baselines on challenging tasks in the StarCraft II micromanagement benchmark.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Add feedback

Episodic Multi agent Reinforcement Learning with Curiosity driven Exploration

Neural Information Processing SystemsApr-25-2026, 00:49:22 GMT

Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems. In this paper, we introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We leverage an insight of popular factorized MARL algorithms that the "induced" individual Q-values, i.e., the individual utility functions used for local execution, are the embeddings of local actionobservation histories, and can capture the interaction between agents due to reward backpropagation during centralized training. Therefore, we use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training. As the dynamics of an agent's individual Q-value function captures the novelty of states and the influence from other agents, our intrinsic reward can induce coordinated exploration to new or promising states. We illustrate the advantages of our method by didactic examples, and demonstrate its significant outperformance over state-of-the-art MARL baselines on challenging tasks in the StarCraft II micromanagement benchmark.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry: Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)

Add feedback

1e79596878b2320cac26dd792a6c51c9-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 00:48:18 GMT

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.28)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

1f3b0b15d6bb860dcfa6e5c8ba7d3d96-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 00:47:50 GMT

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Energy (0.67)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Game Theory (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

1e747ddbea997a1b933aaf58a7953c3c-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 00:47:39 GMT

machine learning, natural language, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
(2 more...)

Add feedback

Appendix for On Effective Scheduling of Model based Reinforcement Learning

Neural Information Processing SystemsApr-25-2026, 00:31:26 GMT

We call c(m) the m-step concentrability of a future-state distribution and call Cρ,µ the discountedaverage concentrability coefficient of the future-state distributions. The class of MDPs that satisfies this concentrability assumption is quite large, which is further discussed in Munos and Szepesvári [18]. If Xi, i = 1,...,N is an i.i.d. And when q = 1, N is used instead of N1. From the definition, one can esasily see that Nq,FX1:N N. Lemma A.2. (Single Iteration Error Bound) Let Vk and Vk+1 be the value functions of iteration kand k+1, and Vmax = rmax/(1 γ).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

1e4d36177d71bbb3558e43af9577d70e-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 00:31:23 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)

Add feedback

1e0f65eb20acbfb27ee05ddc000b50ec-Supplemental.pdf

ruihanyang

Neural Information Processing SystemsApr-25-2026, 00:31:09 GMT

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.95)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

1e0f65eb20acbfb27ee05ddc000b50ec-Paper.pdf

ruihanyang

Neural Information Processing SystemsApr-25-2026, 00:31:06 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural Auto-Curricula

Neural Information Processing SystemsApr-24-2026, 23:50:08 GMT

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper1, we introduce a novel framework--Neural Auto-Curricula (NAC)--that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the bestresponse module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.

arxiv preprint arxiv, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology: