Optimal and Adaptive Off-policy Evaluation in Contextual Bandits