Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Neural Information Processing Systems 

It is known that policy evaluation has the interpretation of solving a generalized Bellman equation. In this paper, we derive finite-sample bounds for any general off-policy TD-like stochastic approximation algorithm that solves for the fixed-point of this generalized Bellman operator.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found