AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

05d8cccb5f47e5072f0a05b5f514941a-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 07:56:41 GMT

sokoban, subgoal, subgoal generator, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Poland > Masovia Province > Warsaw (0.05)
(13 more...)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(3 more...)

Add feedback

2022DOPE

Archana Bura

Neural Information Processing SystemsFeb-7-2026, 07:56:09 GMT

Ateachh2[H] inanepisodek, thealgorithmsh, k, selects ah, k h, k(sh, k, ), and costsrh(sh, k,ah, k)andch(sh, k,ah, k). Wewillalsoshowthat k from (10) (onceitbecomes feasible) willindeedbeasafepolicy (see Proposition 5).

cmdp, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

HowtoLearnaUsefulCritic?Model-based Action-Gradient-EstimatorPolicyOptimization

Neural Information Processing SystemsFeb-7-2026, 07:55:43 GMT

However, instead of gradients, the critic is, typically, only trained to accurately predict expected returns, which, on their own, are useless for policy optimization.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
North America > Canada > Quebec (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

075b2875e2b671ddd74aeec0ac9f0357-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 07:47:39 GMT

agent, information, representation, (12 more...)

Neural Information Processing Systems

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

Add feedback

04d212c4eeeb710f170d47f8d5b9b88a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 07:37:32 GMT

A wide array of control applications, ranging from medical to engineering, fundamentally deals with critical systems, i.e., systems of vital importance where the control actions have to guarantee no harm to the system functionality. Examples include managing nuclear fusion [Degrave et al., 2022], performing robotic surgeries [Datta et al., 2021], and devising patient treatment strategies [Komorowski et al., 2018].

areg, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Portugal > Braga > Braga (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

04c0c541d936f7cbfbb21085116236cd-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-7-2026, 07:29:28 GMT

agent, dataset, production rule, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
North America > Canada > Alberta (0.14)
North America > Canada > Quebec > Montreal (0.05)
(2 more...)

Genre: Research Report (0.46)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

036912a83bdbb1fd792baf6532f102d8-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 07:18:21 GMT

expansion, taylor td, variance, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Bristol (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Manuals

Neural Information Processing SystemsFeb-7-2026, 06:59:35 GMT

High sample complexity has long been a challenge for RL.

machine learning, natural language, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Industry: Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

TimeDiscretization-Invariant SafeActionRepetitionforPolicyGradientMethods

Neural Information Processing SystemsFeb-7-2026, 06:52:57 GMT

In reinforcement learning, continuous time is often discretized by a time scale δ, to which the resulting performance is known to be highly sensitive. In this work, we seek tofind aδ-invariantalgorithm for policygradient (PG) methods, which performs well regardless of the value ofδ. We first identify the underlying reasons that cause PG methods to fail asδ 0, proving that the variance of the PG estimator can diverge to infinity in stochastic environments under a certain assumption of stochasticity. While durative actions or action repetition can be employed to haveδ-invariance, previous action repetition methods cannot immediately react to unexpected situations in stochastic environments. We thus propose a novelδ-invariant method namedSafe Action Repetition (SAR) applicable to any existing PG algorithm. SAR can handle the stochasticity of environments byadaptivelyreacting tochanges instates during action repetition.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > France (0.04)
Asia > Vietnam > Long An Province (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Trust Region-Guided Proximal Policy Optimization

Neural Information Processing SystemsFeb-6-2026, 11:28:27 GMT

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO relies heavily on the effectiveness of its exploratory policy search. In this paper, we give an in-depth analysis on the exploration behavior of PPO, and show that PPO is prone to suffer from the risk of lack of exploration especially under the case of bad initialization, which may lead to the failure of training or being trapped in bad local optima. To address these issues, we proposed a novel policy optimization method, named Trust Region-Guided PPO (TRGPPO), which adaptively adjusts the clipping range within the trust region. We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well. Extensive experiments verify the advantage of the proposed method.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback