Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou

Neural Information Processing Systems 

In with step-wise generally on-policy o with estimates model discrete a neural median Because trajectories T-step our IS/WIS T mo of20when re picks infinite map corresponding one T the iterations.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found