Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Neural Information Processing Systems 

Existing methods such as sequential importance sampling estimators suffer from the curse of horizon in POMDPs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found