Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

Open in new window