AITopics | bandit convex optimization

algorithm, convex optimization, optimization, (13 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Improved Dimension Dependence for Bandit Convex Optimization with Gradient Variations

Yu, Hang, Yan, Yu-Hu, Zhao, Peng

arXiv.org Machine LearningFeb-5-2026

Gradient-variation online learning has drawn increasing attention due to its deep connections to game theory, optimization, etc. It has been studied extensively in the full-information setting, but is underexplored with bandit feedback. In this work, we focus on gradient variation in Bandit Convex Optimization (BCO) with two-point feedback. By proposing a refined analysis on the non-consecutive gradient variation, a fundamental quantity in gradient variation with bandits, we improve the dimension dependence for both convex and strongly convex functions compared with the best known results (Chiang et al., 2013). Our improved analysis for the non-consecutive gradient variation also implies other favorable problem-dependent guarantees, such as gradient-variance and small-loss regrets. Beyond the two-point setup, we demonstrate the versatility of our technique by achieving the first gradient-variation bound for one-point bandit linear optimization over hyper-rectangular domains. Finally, we validate the effectiveness of our results in more challenging tasks such as dynamic/universal regret minimization and bandit games, establishing the first gradient-variation dynamic and universal regret bounds for two-point BCO and fast convergence rates in bandit games.

artificial intelligence, bandit convex optimization, machine learning, (14 more...)

arXiv.org Machine Learning

2602.04761

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.87)

Add feedback

Locally Differentially Private (Contextual) Bandits Learning

Neural Information Processing SystemsDec-24-2025, 07:09:15 GMT

We study locally differentially private (LDP) bandits learning in this paper. First, we propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee. Based on our frameworks, we can improve previous best results for private bandits learning with one-point feedback, such as private Bandits Convex Optimization etc, and obtain the first results for Bandits Convex Optimization (BCO) with multi-point feedback under LDP. LDP guarantee and black-box nature make our frameworks more attractive in real applications compared with previous specifically designed and relatively weaker differentially private (DP) algorithms. Further, we also extend our algorithm to Generalized Linear Bandits with regret bound $\tilde{\mc{O}}(T^{3/4}/\varepsilon)$ under $(\varepsilon, \delta)$-LDP and it is conjectured to be optimal. Note given existing $\Omega(T)$ lower bound for DP contextual linear bandits (Shariff & Sheffet, NeurIPS 2018), our result shows a fundamental difference between LDP and DP for contextual bandits.

bandit, contextual, name change, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Optimistic Bandit Convex Optimization

Scott Yang, Mehryar Mohri

Neural Information Processing SystemsNov-21-2025, 09:05:00 GMT

We introduce the general and powerful scheme of predicting information re-use in optimization algorithms.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Improved Regret for Bandit Convex Optimization with Delayed Feedback Y uanyu Wan 1,2,3, Chang Y ao

Neural Information Processing SystemsOct-9-2025, 16:56:20 GMT

However, there is a large gap between its delay-dependent part, i.e.,

algorithm, convex optimization, optimization, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education > Educational Setting > Online (0.47)
Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

ebpc-9

J Sun

Neural Information Processing SystemsOct-8-2025, 14:07:38 GMT

The best-known sublinear regret algorithm of Gradu et al. [ 2020 ] has a

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

Ofer Dekel, Ronen Eldan, Tomer Koren

Neural Information Processing SystemsOct-2-2025, 06:47:01 GMT

Bandit convex optimization is one of the fundamental problems in the field of online learning.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Bandit Convex Optimization: Towards Tight Bounds

Neural Information Processing SystemsSep-30-2025, 09:01:35 GMT

Bandit Convex Optimization (BCO) is a fundamental framework for decision making under uncertainty, which generalizes many problems from the realm of online and statistical learning. While the special case of linear cost functions is well understood, a gap on the attainable regret for BCO with nonlinear losses remains an important open question. In this paper we take a step towards understanding the best attainable regret bounds for BCO: we give an efficient and near-optimal regret algorithm for BCO with strongly-convex and smooth loss functions. In contrast to previous works on BCO that use time invariant exploration schemes, our method employs an exploration scheme that shrinks with time.

bandit convex optimization, name change, tight bound, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improved Regret for Bandit Convex Optimization with Delayed Feedback

Neural Information Processing SystemsMay-26-2025, 14:39:18 GMT

We investigate bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under an arbitrary delay. Let n,T,\bar{d} denote the dimensionality, time horizon, and average delay, respectively. Previous studies have achieved an O(\sqrt{n}T {3/4} (n\bar{d}) {1/3}T {2/3}) regret bound for this problem, whose delay-independent part matches the regret of the classical non-delayed bandit gradient descent algorithm. However, there is a large gap between its delay-dependent part, i.e., O((n\bar{d}) {1/3}T {2/3}), and an existing \Omega(\sqrt{\bar{d}T}) lower bound. In this paper, we illustrate that this gap can be filled in the worst case, where \bar{d} is very close to the maximum delay d . Specifically, we first develop a novel algorithm, and prove that it enjoys a regret bound of O(\sqrt{n}T {3/4} \sqrt{dT}) in general.

artificial intelligence, bandit convex optimization, machine learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Bandit Convex Optimization: Towards Tight Bounds

Elad Hazan, Kfir Levy

Neural Information Processing SystemsFeb-9-2025, 19:22:44 GMT

Bandit Convex Optimization (BCO) is a fundamental framework for decision making under uncertainty, which generalizes many problems from the realm of online and statistical learning. While the special case of linear cost functions is well understood, a gap on the attainable regret for BCO with nonlinear losses remains an important open question. In this paper we take a step towards understanding the best attainable regret bounds for BCO: we give an efficient and near-optimal regret algorithm for BCO with strongly-convex and smooth loss functions. In contrast to previous works on BCO that use time invariant exploration schemes, our method employs an exploration scheme that shrinks with time.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback