AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

main-neurips22

Karan Singh

Neural Information Processing SystemsFeb-12-2026, 06:47:54 GMT

algorithm, learning, weak learner, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

VIREL: A Variational Inference Framework for Reinforcement Learning

Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson

Neural Information Processing SystemsFeb-12-2026, 06:27:13 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, learning, proceedings, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(12 more...)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

57db7d68d5335b52d5153a4e01adaa6b-Paper.pdf

Neural Information Processing SystemsFeb-12-2026, 06:19:21 GMT

hindsight goal, international conference, learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Skip Intervals: Temporal Abstraction for Recurrent Dynamical Models

Alexander Neitz, Giambattista Parascandolo, Stefan Bauer, Bernhard Schölkopf

Neural Information Processing SystemsFeb-12-2026, 05:52:39 GMT

Moreover,inmanysituations,there exist prediction intervals which result in particularly easy-to-predict transitions.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

56bd37d3a2fda0f2f41925019c81011d-Paper.pdf

Neural Information Processing SystemsFeb-12-2026, 05:51:43 GMT

attacker, defender, information, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > Canada (0.04)
Asia > Malaysia (0.04)

Industry:

Government > Military (0.93)
Information Technology > Security & Privacy (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Total stochastic gradient algorithms and applications in reinforcement learning

Paavo Parmas

Neural Information Processing SystemsFeb-12-2026, 05:41:05 GMT

Neural Information Processing Systems http://nips.cc/

estimator, gradient, gradient estimator, (15 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Add feedback

d8684e49752e06ac5e4b554b60ad212a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 05:31:28 GMT

abstract simulator, algorithm, target environment, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Near-OptimalRegretforAdversarialMDPwith DelayedBanditFeedback

Neural Information Processing SystemsFeb-12-2026, 05:29:44 GMT

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observedindelay.

machine learning, qkh, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Nevada (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Near-OptimalRegretforAdversarialMDPwith DelayedBanditFeedback

Neural Information Processing SystemsFeb-12-2026, 05:29:41 GMT

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observedindelay.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: