AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Xueying Bai, Jian Guan, Hongning Wang

Neural Information Processing SystemsFeb-14-2026, 19:22:30 GMT

Reinforcement learning is well suited for optimizing policies of recommender systems.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

764c18ad230f9e7bf6a77ffc2312c55e-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-14-2026, 19:22:12 GMT

machine learning, reinforcement learning, shield 0, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

764c18ad230f9e7bf6a77ffc2312c55e-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-14-2026, 19:22:09 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao 1, Jonathan D. Chang

Neural Information Processing SystemsFeb-14-2026, 19:00:19 GMT

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g.

large language model, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > France (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Sports > Hockey (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation

Jiaxuan You, Bowen Liu, Zhitao Ying, Vijay Pande, Jure Leskovec

Neural Information Processing SystemsFeb-14-2026, 18:58:24 GMT

Neural Information Processing Systems http://nips.cc/

graph, molecule, representation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(2 more...)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

Harsh Gupta, R. Srikant, Lei Ying

Neural Information Processing SystemsFeb-14-2026, 18:11:57 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

Neural Information Processing SystemsFeb-14-2026, 18:11:48 GMT

NeurIPS 2019 Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center "2626" "Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning" The reviewers unanimously support acceptance. We encourage the authors to strongly consider the suggestions provided by the reviewers for improving a camera ready version.

artificial intelligence, file paper 2019, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)

Add feedback

We thank all the reviewers for their encouraging comments

Neural Information Processing SystemsFeb-14-2026, 18:11:42 GMT

We thank all the reviewers for their encouraging comments. In both these cases, τ is effectively zero. Liu et al. shows how GTD-class algorithms can be formally derived using a primal-dual saddle point Sutton et al. presents a (single time-scale) variant of linear TD learning, which they call emphatic TD and show that They also provide an asymptotic convergence analysis to the set of local optima. If the paper is accepted, we will work further on improving the clarity of the work.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback