The Price of Bandit Information for Online Optimization
Dani, Varsha, Kakade, Sham M., Hayes, Thomas P.
–Neural Information Processing Systems
We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit information, by which we mean the ratio of the best achievable regret in the bandit setting to that in the full-information setting.
Neural Information Processing Systems
Dec-31-2008