Exploiting Correlated Auxiliary Feedback in Parameterized Bandits

Neural Information Processing Systems 

In this paper, we first develop a method that exploits auxiliary feedback to build a reward estimator with tight confidence bounds, leading to a smaller regret.