On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization

Open in new window