PAC-Bayesian Analysis of Contextual Bandits François Laviolette

Neural Information Processing Systems 

We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits).