Online Learning with Sublinear Best-Action Queries

Neural Information Processing Systems 

In online learning, a decision maker repeatedly selects one of a set of actions, with the goal of minimizing the overall loss incurred.