Online Optimization in X-Armed Bandits

Bubeck, Sébastien, Stoltz, Gilles, Szepesvári, Csaba, Munos, Rémi

Dec-31-2009–Neural Information Processing Systems

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Hölder with a known exponent, then the expected regret is bounded up to a logarithmic factor by $n$, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.

artificial intelligence, big data, node, (20 more...)

Neural Information Processing Systems

Dec-31-2009

Conferences PDF

Add feedback

Country:
- North America > Canada > Alberta (0.14)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (0.68)
    - Representation & Reasoning > Search (0.35)
  - Data Science > Data Mining
    - Big Data (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found