The Fixed Points of Off-Policy TD
–Neural Information Processing Systems
TD can fail to converge [Boyan, 1994] [Tsitsiklis and Van Roy, 1997] fixed! J. Zico Kolter | The Fixed Points of Off-Policy TD | Poster T6 This work is about fixing off-policy TD Basic idea: reweight samples so that TD solution has quality guarantees (and so that TD converges) Technical idea "filtered" states stationary distribution of policy
Neural Information Processing Systems
Mar-15-2024, 14:57:28 GMT