AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedback

Neural Information Processing SystemsApr-24-2026, 05:30:38 GMT

We present a novel actor-critic algorithm for an environment with delayed feedback, which addresses the state-space explosion problem of conventional approaches. Conventional approaches use an augmented state constructed from the last observed state and actions executed since visiting the last observed state Using the augmented state space, the correct Markov decision process for delayed environments can be constructed; however, this causes the state space to explode as the number of delayed timesteps increases, leading to slow convergence. Our proposed algorithm, called Belief-Projection-Based Q-learning (BPQL), addresses the state-space explosion problem by evaluating the values of the critic for which the input state size is equal to the original state-space size rather than that of the augmented one. We compare BPQL to traditional approaches in continuous control tasks and demonstrate that it significantly outperforms other algorithms in terms of asymptotic performance and sample efficiency. We also show that BPQL solves long-delayed environments, which conventional approaches are unable to do.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: Europe (0.93)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Diffused Task-Agnostic Milestone Planner

Neural Information Processing SystemsApr-24-2026, 05:03:42 GMT

Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space and to have an agent to follow the milestones to accomplish a given task. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control. Furthermore, our approach exploits generation flexibility of the diffusion model, which makes it possible to plan diverse trajectories for multi-task decision-making. We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Appendix Gigastep - One Billion Steps per Second Multi-agent Reinforcement Learning

Neural Information Processing SystemsApr-24-2026, 04:43:02 GMT

In this section, we train policies for different scenarios to validate that the tasks defined in Gigastep can be solved with multi-agent RL algorithms. In particular, we use multi-agent PPO implemented in JAX. In competitive or adversarial MARL, an objective reward measure is not defined, as the collected reward inherently depends on the relative strength of the opposing agent's policy. Therefore, to measure the training progress, we compare the current policy with previous checkpoints of the same policy at earlier training iterations. Specifically, an improving policy should be able to outperform its previous counterparts.

checkpoint, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

00ba06ba5c324efdfb068865ca44cf0b-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-24-2026, 04:42:59 GMT

evolutionary algorithm, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.15)

Industry:

Leisure & Entertainment > Games > Computer Games (0.68)
Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(3 more...)

Add feedback

Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

Neural Information Processing SystemsApr-24-2026, 04:16:51 GMT

Meta-reinforcement learning (Meta-RL) has attracted attention due to its capability to enhance reinforcement learning (RL) algorithms, in terms of data efficiency and generalizability. In this paper, we develop a bilevel optimization framework for meta-RL (BO-MRL) to learn the meta-prior for task-specific policy adaptation, which implements multiple-step policy optimization on one-time data collection. Beyond existing meta-RL analyses, we provide upper bounds of the expected optimality gap over the task distribution. This metric measures the distance of the policy adaptation from the learned meta-prior to the task-specific optimum, and quantifies the model's generalizability to the task distribution. We empirically validate the correctness of the derived upper bounds and demonstrate the superior effectiveness of the proposed algorithm over benchmarks.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

001608167bb652337af5df0129aeaabd-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 04:16:46 GMT

arxiv preprint arxiv, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.93)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

Zap Q-Learning

Adithya M Devraj, Sean Meyn

Neural Information Processing SystemsApr-23-2026, 03:09:47 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Tejas D. Kulkarni, Karthik Narasimhan, Ardavan Saeedi, Josh Tenenbaum

Neural Information Processing SystemsApr-22-2026, 13:44:51 GMT

Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. One of the key difficulties is insufficient exploration, resulting in an agent being unable to learn robust policies. Intrinsically motivated agents can explore new behavior for their own sake rather than to directly solve external goals. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchicalDQN (h-DQN), a framework to integrate hierarchical action-value functions, operating at different temporal scales, with goal-driven intrinsically motivated deep reinforcement learning. A top-level q-value function learns a policy over intrinsic goals, while a lower-level function learns a policy over atomic actions to satisfy the given goals.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Spain (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, Shimon Whiteson

Neural Information Processing SystemsApr-22-2026, 03:44:24 GMT

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate endto-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

Reinforcement Learning

Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedback

Diffused Task-Agnostic Milestone Planner

Appendix Gigastep - One Billion Steps per Second Multi-agent Reinforcement Learning

00ba06ba5c324efdfb068865ca44cf0b-Paper-Datasets_and_Benchmarks.pdf

007f4927e60699392425f267d43f0940-Paper-Conference.pdf

Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

001608167bb652337af5df0129aeaabd-Paper-Conference.pdf

Zap Q-Learning

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Learning to Communicate with Deep Multi-Agent Reinforcement Learning