Doubly-Robust Lasso Bandit

Gi-Soo Kim, Myunghee Cho Paik

Neural Information Processing Systems 

While therewardcompensation mechanism isunknown,the learner can adapt his (her) decision to the past reward feedback so as to maximize the sum of rewards.