Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions Haanvid Lee

Neural Information Processing Systems 

We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found