Stochastic convex optimization with bandit feedback
–Neural Information Processing Systems
This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x X. We demonstrate a generalization of the ellipsoid algorithm that incurs Õ(poly(d) T) regret. Since any algorithm has regret at least Ω( T) on this problem, our algorithm is optimal in terms of the scaling with T.
Neural Information Processing Systems
Mar-15-2024, 02:59:52 GMT
- Country:
- North America > United States
- Pennsylvania (0.04)
- New York (0.04)
- North America > United States
- Genre:
- Research Report (0.46)
- Technology: