Bounded Regret for Finite-Armed Structured Bandits
–Neural Information Processing Systems
We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problem-dependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.
Neural Information Processing Systems
Dec-31-2014
- Country:
- Europe > France
- Hauts-de-France > Nord > Lille (0.04)
- North America > Canada
- Alberta (0.14)
- Oceania > Australia (0.04)
- Europe > France
- Genre:
- Research Report (0.46)
- Technology: