Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

Neural Information Processing Systems 

Contextual bandit is a widely used model for sequential decision making.