Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Neural Information Processing Systems 

Off-policy evaluation of sequential decision policies from observational data is necessary in applications of batch reinforcement learning such as education and healthcare.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found