AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Contrastive Learning as Goal-Conditioned Reinforcement Learning Benjamin Eysenbach

Neural Information Processing SystemsAug-19-2025, 15:12:00 GMT

In reinforcement learning (RL), it is easier to solve a task if given a good representation.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Recursive Reinforcement Learning

Name

Neural Information Processing SystemsAug-19-2025, 14:56:27 GMT

These RL algorithms are designed with a flat Markovian view of the environment in the form of a "state, action, reward, and next state" interface [

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado > Boulder County > Boulder (0.05)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context

Neural Information Processing SystemsAug-19-2025, 14:42:52 GMT

The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

e2ef0cae667dbe9bfdbcaed1bd91807b-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 13:34:20 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

PerfectDou: Dominating DouDizhu with Perfect Information Distillation Guan Y ang

Neural Information Processing SystemsAug-19-2025, 13:23:18 GMT

As a challenging multi-player card game, DouDizhu has recently drawn much attention for analyzing competition and collaboration in imperfect-information games. In this paper, we propose PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game, in an actor-critic framework with a proposed technique named perfect information distillation.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Add feedback

Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing Y aodong Y ang

Neural Information Processing SystemsAug-19-2025, 13:09:06 GMT

Our second contribution is to introduce an action semantic inductive bias and implement it with a technique named Entity-Bound Action Layer.

machine learning, natural language, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Industry:

Leisure & Entertainment > Games (0.68)
Health & Medicine (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Supplementary Material for Rethinking Value Function Learning for Generalization in Reinforcement Learning A Stiffness Analysis

Neural Information Processing SystemsAug-19-2025, 13:06:37 GMT

The green lines in Figure 1 demonstrate that the stiffness decreases as the number of training levels increases in most of the Procgen games. This suggests that the delayed critic update effectively alleviates the memorization problem. Each agent is trained on 200 training levels for 25M environment steps. Each agent is trained for 8M environment steps. The mean is computed over 10 different runs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

e19ab2dde2e60cf68d1ded18c38938f4-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 13:06:34 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.68)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

How to talk so AI will learn: Instructions, descriptions, and autonomy

Neural Information Processing SystemsAug-19-2025, 12:54:09 GMT

From the earliest years of our lives, humans use language to express our beliefs and desires. Being able to talk to artificial agents about our preferences would thus fulfill a central goal of value alignment. Y et today, we lack computational models explaining such language use.

machine learning, natural language, reinforcement learning, (11 more...)

Neural Information Processing Systems

Country: