Stochastic Multi-Armed Bandits with Control Variates
–Neural Information Processing Systems
This paper studies a new variant of the stochastic multi-armed bandits problem where auxiliary information about the arm rewards is available in the form of control variates. In many applications like queuing and wireless networks, the arm rewards are functions of some exogenous variables.
Neural Information Processing Systems
Nov-15-2025, 23:47:25 GMT
- Country:
- Asia
- India > Maharashtra
- Mumbai (0.04)
- Singapore (0.04)
- India > Maharashtra
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia
- Genre:
- Research Report (0.66)
- Technology: