Optimal Baseline Corrections for Off-Policy Contextual Bandits

Open in new window