AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

MinimaxValueIntervalforOff-PolicyEvaluation andPolicyOptimization

Neural Information Processing SystemsFeb-7-2026, 17:04:12 GMT

FunctionApproximation Throughout thepaper,weassume access totwofunction classesQ (S A R)andW (S A R). Todevelop intuition, theyare supposed to modelQπ and wπ/µ, respectively, though most of our main results are stated without assuming any kind of realizability.

lbw, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

1a6727711b84fd1efbb87fc565199d13-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 16:56:11 GMT

algorithm, distribution network, ieee transaction, (12 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom (0.14)
Asia > India (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Solar (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

0b13c22ca208bc08f3fd13793292f25f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 16:46:25 GMT

algorithm, policy optimization algorithm, sample complexity, (9 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Weakly-Supervised ReinforcementLearningfor ControllableBehavior

Neural Information Processing SystemsFeb-7-2026, 16:44:33 GMT

We show that thislearned subspace enables efficient exploration andprovides arepresentation that captures distance between states.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

OntheConvergenceTheoryofDebiased Model-AgnosticMeta-ReinforcementLearning

Neural Information Processing SystemsFeb-7-2026, 16:24:40 GMT

In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

178b306c7ee66a66db2171646e17da36-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 16:15:17 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Object-AwareRegularizationfor AddressingCausalConfusioninImitationLearning

Neural Information Processing SystemsFeb-7-2026, 16:06:46 GMT

Behavioral cloning has proven to be effective for learning sequential decisionmaking policies fromexpertdemonstrations.

expert demonstration, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

VisualAdversarialImitationLearning usingVariationalModels

Neural Information Processing SystemsFeb-7-2026, 16:06:10 GMT

Behaviour cloning (BC) is a classic algorithm to imitate expert demonstrations [7], which uses supervised learning to greedily match the expert behaviour at demonstrated expert states. Due to environmentstochasticity,covariateshift,andpolicyapproximationerror,theagentmaydriftaway from the expert state distribution and ultimately fail to mimic the demonstrator [8].

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: