Stochastic Multi-Armed Bandits with Control Variates

Neural Information Processing Systems 

This paper studies a new variant of the stochastic multi-armed bandits problem where auxiliary information about the arm rewards is available in the form of control variates. In many applications like queuing and wireless networks, the arm rewards are functions of some exogenous variables.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found