Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

Neural Information Processing Systems 

In order to make counterfactual evaluations possible, a standard assumption--albeit often overlooked and unstated--is to require that the behavior policy does not depend on any unobserved variables that also affect the future states/rewards (no unobserved confounding).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found