AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Self-ImitationLearningviaGeneralizedLower BoundQ-learning

Neural Information Processing SystemsFeb-9-2026, 14:54:58 GMT

NaiveIS estimator involves products of the form π(at | xt)/µ(at | xt) and is infeasible in practice due to high variance. To control the variance, a line of prior work has focused on operator-based estimation to avoid fullIS products, which reduces the estimation procedure into repeated iterations of off-policyevaluation operators [1-3].

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

82e9e7a12665240d13d0b928be28f230-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 14:47:11 GMT

algorithmic bottleneck, arxiv preprint arxiv, executor, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Serbia > Central Serbia > Belgrade (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Tuning Mixed Input Hyperparametersonthe Flyfor Efficient Population Based AutoRL

Neural Information Processing SystemsFeb-9-2026, 14:46:07 GMT

artificial intelligence, international conferenceon machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
Oceania > Australia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

main

Zhuoran Yang

Neural Information Processing SystemsFeb-9-2026, 14:36:21 GMT

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

DecentralizedTDTrackingwithLinearFunction Approximationandits Finite-TimeAnalysis

Neural Information Processing SystemsFeb-9-2026, 14:06:02 GMT

While TD updates are simple, a rigorous analysis of TD methods requires sophisticated tools.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

69413f87e5a34897cd010ca698097d0a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 13:55:57 GMT

agent, arxiv preprint arxiv, sequence, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)

Add feedback

24662461d2194d1bc70a47b6b6771026-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 13:47:15 GMT

Existing works mainly focus on arranging the levels to explicitly form a curriculum. In this work, we take a close look atthelearning process itself under themulti-leveltraining inProcgen.

justification, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

DiscoveredPolicyOptimisation

Neural Information Processing SystemsFeb-9-2026, 13:45:34 GMT

Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations, intuitions, and experimentation. Such an approach of creating algorithms manually is limited by human understanding and ingenuity.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

ALocalTemporalDifferenceCodeforDistributional ReinforcementLearning

Neural Information Processing SystemsFeb-9-2026, 13:45:16 GMT

However, since this decoder effectively approximates thenth derivative of the input vector, it is very sensitive to noise. In our framework, the input is often very noisy, since it corresponds to the converging points of different learning traces. In this section we describe two linear decoders that differ from that in [35] and are more noise-resilient. A.9 and A.10 is crucial for long temporal horizons, since regularization causes the overall magnitude of the recoveredτ-space to decrease asτ increases3. Normalization amends thedecreasing magnitude problem bymaking theτ-space to sum to 1 for everyτ.

correspond, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback