AITopics | bellman residual

Is the Bellman residual a bad proxy?

Neural Information Processing SystemsMar-17-2026, 18:21:13 GMT

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual $\|T_* v_\pi - v_\pi\|_{1,\nu}$ over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both

Neural Information Processing SystemsFeb-16-2026, 06:18:22 GMT

We argue that these properties are satisfied in many continuous state-action Markov decision processes.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.67)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Is the Bellman residual a bad proxy?

Neural Information Processing SystemsNov-21-2025, 16:10:58 GMT

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual $\|T_* v_\pi - v_\pi\|_{1,\nu}$ over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.

bad proxy, bellman residual, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

e0ab531ec312161511493b002f9be2ee-Paper.pdf

Neural Information Processing SystemsNov-21-2025, 13:22:46 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both

Neural Information Processing SystemsOct-10-2025, 08:03:29 GMT

We argue that these properties are satisfied in many continuous state-action Markov decision processes.

experiment, function approximation, inequality, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.67)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

e0ab531ec312161511493b002f9be2ee-Paper.pdf

Neural Information Processing SystemsOct-4-2024, 08:46:12 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces

Duan, Yaqi, Wainwright, Martin J.

arXiv.org Machine LearningJan-10-2024

We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both off-line and on-line settings. Our analysis highlights two key stability properties, relating to how changes in value functions and/or policies affect the Bellman operator and occupation measures. We argue that these properties are satisfied in many continuous state-action Markov decision processes, and demonstrate how they arise naturally when using linear function approximation methods. Our analysis offers fresh perspectives on the roles of pessimism and optimism in off-line and on-line RL, and highlights the connection between off-line RL and transfer learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2401.05233

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Robust Value Function Approximation Using Bilinear Programming

Neural Information Processing SystemsApr-6-2023, 14:07:15 GMT

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, it provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NP hard, but this is unavoidable because the Bellman-residual minimization itself is NP hard. We, therefore, employ and analyze a common approximate algorithm for bilinear programs.

bellman residual, bilinear programming, robust value function approximation

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.95)

Add feedback

Variational Actor-Critic Algorithms

Zhu, Yuhua, Ying, Lexing

arXiv.org Artificial IntelligenceJan-13-2023

We introduce a class of variational actor-critic algorithms based on a variational formulation over both the value function and the policy. The objective function of the variational formulation consists of two parts: one for maximizing the value function and the other for minimizing the Bellman residual. Besides the vanilla gradient descent with both the value function and the policy updates, we propose two variants, the clipping method and the flipping method, in order to speed up the convergence. We also prove that, when the prefactor of the Bellman residual is sufficiently large, the fixed point of the algorithm is close to the optimal policy.

algorithm, lemma 3, objective function, (16 more...)

arXiv.org Artificial Intelligence

2108.01215

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

Xia, Eric, Wainwright, Martin J.

arXiv.org Artificial IntelligenceOct-20-2022

We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for policy evaluation in general state spaces. It alternates between fitting the Bellman residual using non-parametric regression (as in boosting), and estimating the value function via the least-squares temporal difference (LSTD) procedure applied with a feature set that grows adaptively over time. By exploiting the connection to Krylov methods, we equip this method with two attractive guarantees. First, we provide a general convergence bound that allows for separate estimation errors in residual fitting and LSTD computation. Consistent with our numerical experiments, this bound shows that convergence rates depend on the restricted spectral structure, and are typically super-linear. Second, by combining this meta-result with sample-size dependent guarantees for residual fitting and LSTD computation, we obtain concrete statistical guarantees that depend on the sample size along with the complexity of the function class used to fit the residuals. We illustrate the behavior of the KBB algorithm for various types of policy evaluation problems, and typically find large reductions in sample complexity relative to the standard approach of fitted value iterationn.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2210.11377

Country: