Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Aug-16-2025, 23:40:01 GMT–Neural Information Processing Systems

It is known that policy evaluation has the interpretation of solving a generalized Bellman equation. In this paper, we derive finite-sample bounds for any general off-policy TD-like stochastic approximation algorithm that solves for the fixed-point of this generalized Bellman operator.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Aug-16-2025, 23:40:01 GMT

Conferences PDF

Add feedback

Country:
- North America
  - Canada > Alberta (0.14)
  - United States > Texas
    - Travis County > Austin (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Finite-SampleAnalysisofOff-PolicyTD-Learningvia GeneralizedBellmanOperators

Similar Docs Excel Report more

Title	Similarity	Source
None found