Goto

Collaborating Authors

 sutton









Finite-SampleAnalysisofOff-PolicyTD-Learningvia GeneralizedBellmanOperators

Neural Information Processing Systems

Itisknown that policyevaluation has the interpretation of solving ageneralized Bellman equation. Inthispaper,wederivefinite-sample bounds foranygeneral off-policy TD-like stochastic approximation algorithm that solves for the fixedpoint of this generalized Bellman operator.