AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsAug-15-2025, 11:28:57 GMT

The model is trained to minimise the value function while still accurately predicting the transitions in the dataset, forcing the policy to act conservatively in areas not covered by the dataset. To approximately solve the two-player game, we alternate between optimising the policy and adversarially optimising the model.

dataset, international conference, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Experimental Details

Neural Information Processing SystemsAug-15-2025, 11:14:04 GMT

Gym tasks are shown below in Table 8. Hyperparameter V alue Number of layers 3 Number of attention heads 1 Embedding dimension 128 Nonlinearity function ReLU Batch size 64 Context length K 20 HalfCheetah, Hopper, Walker 5 Reacher Return-to-go conditioning 6000 HalfCheetah 3600 Hopper 5000 Walker 50 Reacher Dropout 0 . 1 Learning rate 10 As briefly mentioned in Section 4.2, we found previously reported behavior cloning baselines to be The percentile behavior cloning experiments use the same hyperparameters. We give details of the illustrative example discussed in the introduction. The action is the integer index of the graph node to move to next. In this environment, we use the GPT model as described in Section 3 to generate both actions and return-to-go tokens.

dataset, decision transformer, experiment, (12 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

Decision Transformer: Reinforcement Learning via Sequence Modeling

Neural Information Processing SystemsAug-15-2025, 11:14:00 GMT

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem.

arxiv preprint arxiv, decision transformer, reinforcement learning, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

9ec51f6eb240fb631a35864e13737bca-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 11:13:01 GMT

algorithm, approximation, function approximation, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > Canada > Alberta (0.14)
Asia > China > Beijing > Beijing (0.04)
(5 more...)

Industry:

Health & Medicine (0.68)
Information Technology (0.46)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

9ec51f6eb240fb631a35864e13737bca-AuthorFeedback.pdf

Neural Information Processing SystemsAug-15-2025, 11:12:50 GMT

algorithm, gradient tracking, submission, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

Add feedback

7f141cf8e7136ce8701dc6636c2a6fe4-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 11:12:32 GMT

decision maker, decision rule, loss function, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

PettingZoo: A Standard API for Multi-Agent Reinforcement Learning J. K. Terry

Neural Information Processing SystemsAug-15-2025, 10:54:50 GMT

This paper introduces the PettingZoo library and the accompanying Agent Environment Cycle ("AEC") games model. PettingZoo is a library of diverse sets of multi-agent environments with a universal, elegant Python API. PettingZoo was developed with the goal of accelerating research in Multi-Agent Reinforcement Learning ("MARL "), by making work more interchangeable, accessible and reproducible akin to what OpenAI's Gym library did for single-agent reinforcement

agent, api, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: