Goto

Collaborating Authors

 bandit convex optimization



Improved Dimension Dependence for Bandit Convex Optimization with Gradient Variations

Yu, Hang, Yan, Yu-Hu, Zhao, Peng

arXiv.org Machine Learning

Gradient-variation online learning has drawn increasing attention due to its deep connections to game theory, optimization, etc. It has been studied extensively in the full-information setting, but is underexplored with bandit feedback. In this work, we focus on gradient variation in Bandit Convex Optimization (BCO) with two-point feedback. By proposing a refined analysis on the non-consecutive gradient variation, a fundamental quantity in gradient variation with bandits, we improve the dimension dependence for both convex and strongly convex functions compared with the best known results (Chiang et al., 2013). Our improved analysis for the non-consecutive gradient variation also implies other favorable problem-dependent guarantees, such as gradient-variance and small-loss regrets. Beyond the two-point setup, we demonstrate the versatility of our technique by achieving the first gradient-variation bound for one-point bandit linear optimization over hyper-rectangular domains. Finally, we validate the effectiveness of our results in more challenging tasks such as dynamic/universal regret minimization and bandit games, establishing the first gradient-variation dynamic and universal regret bounds for two-point BCO and fast convergence rates in bandit games.

  Country:
  Genre: Research Report (0.70)
  Industry: Leisure & Entertainment > Games (0.34)

Locally Differentially Private (Contextual) Bandits Learning

Neural Information Processing Systems

We study locally differentially private (LDP) bandits learning in this paper. First, we propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee. Based on our frameworks, we can improve previous best results for private bandits learning with one-point feedback, such as private Bandits Convex Optimization etc, and obtain the first results for Bandits Convex Optimization (BCO) with multi-point feedback under LDP. LDP guarantee and black-box nature make our frameworks more attractive in real applications compared with previous specifically designed and relatively weaker differentially private (DP) algorithms. Further, we also extend our algorithm to Generalized Linear Bandits with regret bound $\tilde{\mc{O}}(T^{3/4}/\varepsilon)$ under $(\varepsilon, \delta)$-LDP and it is conjectured to be optimal. Note given existing $\Omega(T)$ lower bound for DP contextual linear bandits (Shariff & Sheffet, NeurIPS 2018), our result shows a fundamental difference between LDP and DP for contextual bandits.






Bandit Convex Optimization: Towards Tight Bounds

Neural Information Processing Systems

Bandit Convex Optimization (BCO) is a fundamental framework for decision making under uncertainty, which generalizes many problems from the realm of online and statistical learning. While the special case of linear cost functions is well understood, a gap on the attainable regret for BCO with nonlinear losses remains an important open question. In this paper we take a step towards understanding the best attainable regret bounds for BCO: we give an efficient and near-optimal regret algorithm for BCO with strongly-convex and smooth loss functions. In contrast to previous works on BCO that use time invariant exploration schemes, our method employs an exploration scheme that shrinks with time.


Improved Regret for Bandit Convex Optimization with Delayed Feedback

Neural Information Processing Systems

We investigate bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under an arbitrary delay. Let n,T,\bar{d} denote the dimensionality, time horizon, and average delay, respectively. Previous studies have achieved an O(\sqrt{n}T {3/4} (n\bar{d}) {1/3}T {2/3}) regret bound for this problem, whose delay-independent part matches the regret of the classical non-delayed bandit gradient descent algorithm. However, there is a large gap between its delay-dependent part, i.e., O((n\bar{d}) {1/3}T {2/3}), and an existing \Omega(\sqrt{\bar{d}T}) lower bound. In this paper, we illustrate that this gap can be filled in the worst case, where \bar{d} is very close to the maximum delay d . Specifically, we first develop a novel algorithm, and prove that it enjoys a regret bound of O(\sqrt{n}T {3/4} \sqrt{dT}) in general.


Bandit Convex Optimization: Towards Tight Bounds

Elad Hazan, Kfir Levy

Neural Information Processing Systems

Bandit Convex Optimization (BCO) is a fundamental framework for decision making under uncertainty, which generalizes many problems from the realm of online and statistical learning. While the special case of linear cost functions is well understood, a gap on the attainable regret for BCO with nonlinear losses remains an important open question. In this paper we take a step towards understanding the best attainable regret bounds for BCO: we give an efficient and near-optimal regret algorithm for BCO with strongly-convex and smooth loss functions. In contrast to previous works on BCO that use time invariant exploration schemes, our method employs an exploration scheme that shrinks with time.