Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

Open in new window