The Fixed Points of Off-Policy TD

Mar-15-2024, 14:57:28 GMT–Neural Information Processing Systems

TD can fail to converge [Boyan, 1994] [Tsitsiklis and Van Roy, 1997] fixed! J. Zico Kolter | The Fixed Points of Off-Policy TD | Poster T6 This work is about fixing off-policy TD Basic idea: reweight samples so that TD solution has quality guarantees (and so that TD converges) Technical idea "filtered" states stationary distribution of policy

fixed point, off-policy td, tsitsiklis and van roy, (3 more...)

Neural Information Processing Systems

Mar-15-2024, 14:57:28 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.08)

Technology:
- Information Technology > Artificial Intelligence (0.87)