AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

b2eeb7362ef83deff5c7813a67e14f0a-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 19:20:26 GMT

algorithm, sample complexity, theorem 2, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Finite-SampleAnalysisofOff-PolicyTD-Learningvia GeneralizedBellmanOperators

Neural Information Processing SystemsFeb-10-2026, 19:20:21 GMT

Itisknown that policyevaluation has the interpretation of solving ageneralized Bellman equation. Inthispaper,wederivefinite-sample bounds foranygeneral off-policy TD-like stochastic approximation algorithm that solves for the fixedpoint of this generalized Bellman operator.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

Test Where Decisions Matter: Importance-driven Testing for Deep Reinforcement Learning

Neural Information Processing SystemsFeb-10-2026, 19:18:57 GMT

In this paper, we focus on testing for safety.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Austria > Styria > Graz (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Appendix: Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

Neural Information Processing SystemsFeb-10-2026, 18:58:41 GMT

Thus the optimal average reward of the original MDP and modified MDP differ by O ( ϵ). To ensure Assumption 3.1 (b) is satisfied, an aperiodicity transformation can be implemented. The proof of this theorem can be found in [Sch71]. From Lemma 2.2, we thus have, ( J In order to iterate Equation (8), need to ensure the terms are non-negative. Theorem 3.3 presents an upper bound on the error in terms of the average reward.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

Neural Information Processing SystemsFeb-10-2026, 18:58:38 GMT

Reinforcement Learning algorithms can be broadly classified into value-based methods and policy-based methods.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ContinuousDeepQ-LearninginOptimalControl Problems: NormalizedAdvantageFunctionsAnalysis

Neural Information Processing SystemsFeb-10-2026, 18:29:40 GMT

One of the most effectivecontinuous deep reinforcement learning algorithms is normalized advantage functions (NAF).

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Russia (0.14)
Asia > Russia > Ural Federal District > Sverdlovsk Oblast > Yekaterinburg (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

dc1913d422398c25c5f0b81cab94cc87-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 18:00:39 GMT

agent, auxiliary reward, side effect, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

ProvablyEfficientCausalReinforcementLearning withConfoundedObservationalData

Neural Information Processing SystemsFeb-10-2026, 17:59:49 GMT

Empowered by neural networks, deep reinforcement learning (DRL) achieves tremendous empirical success. However, DRL requires a large dataset by interacting with the environment, which is unrealistic in critical scenarios such as autonomous driving and personalized medicine. In this paper, we study how to incorporate the dataset collected in the offline setting to improve the sample efficiency in the online setting. To incorporate the observational data, we face two challenges.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine (0.67)

Technology: