Improved Regret for Bandit Convex Optimization with Delayed Feedback

May-26-2025, 14:39:18 GMT–Neural Information Processing Systems

We investigate bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under an arbitrary delay. Let n,T,\bar{d} denote the dimensionality, time horizon, and average delay, respectively. Previous studies have achieved an O(\sqrt{n}T {3/4} (n\bar{d}) {1/3}T {2/3}) regret bound for this problem, whose delay-independent part matches the regret of the classical non-delayed bandit gradient descent algorithm. However, there is a large gap between its delay-dependent part, i.e., O((n\bar{d}) {1/3}T {2/3}), and an existing \Omega(\sqrt{\bar{d}T}) lower bound. In this paper, we illustrate that this gap can be filled in the worst case, where \bar{d} is very close to the maximum delay d . Specifically, we first develop a novel algorithm, and prove that it enjoys a regret bound of O(\sqrt{n}T {3/4} \sqrt{dT}) in general.

artificial intelligence, bandit convex optimization, machine learning, (7 more...)

Neural Information Processing Systems

May-26-2025, 14:39:18 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)