On the Curses of Future and History in Future-dependent Value Functions for OPE

Neural Information Processing Systems 

We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found