AITopics | gp-ucb

In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple linear estimator of the unknown function. Despite its popularity, existing analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear for many commonly used kernels such as the Mat ern kernel. This has led to a longstanding open question: are existing regret analyses for GP-UCB tight, or can bounds be improved by using more sophisticated analytical techniques? In this work, we resolve this open question and show that GP-UCB enjoys nearly optimal regret. In particular, our results yield sublinear regret rates for the Mat ern kernel, improving over the state-of-the-art analyses and partially resolving a COL T open problem posed by V akili et al. Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel k . Applying this key idea together with a largely overlooked concentration result in separable Hilbert spaces (for which we provide an independent, simplified derivation), we are able to provide a tighter analysis of the GP-UCB algorithm.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

We thank the reviewers for their constructive comments

Neural Information Processing SystemsAug-20-2025, 02:34:02 GMT

We thank the reviewers for their constructive comments. The two terms on the RHS of Eq. (13) are monotone increasing functions, and Using our Lemma 5.1's proof, Lemma 5.8 and Theorem 2's proof in Srinivas et al [19], the regret bound GP-UCB is chosen as it has ability to analyze convergence, which is very important in the unknown search space setting. EI convergence can be shown only in noiseless setting, PI/ES/PES do not have convergence proof yet). Thank you for the compliment though. We have conducted more experiments with the 10-dimensional functions Ackley10 and Levy10.

constructive comment, experiment, ucb, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

bb073f2855d769be5bf191f6378f7150-AuthorFeedback.pdf

Neural Information Processing SystemsAug-16-2025, 03:11:51 GMT

We thank the reviewers for the positive and constructive feedback. Below we respond to their questions. The regret curves have a very tight confidence bound, starting from the very first iterations. V ariances are almost same across iterations? We did mention this in detail in our supp.

algorithm, error bar, search space, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.33)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.33)

Add feedback

Risk-averse Heteroscedastic Bayesian Optimization

Neural Information Processing SystemsAug-15-2025, 22:43:13 GMT

Many black-box optimization tasks arising in high-stakes applications require risk-averse decisions. The standard Bayesian optimization (BO) paradigm, however, optimizes the expected value only. We generalize BO to trade mean and input-dependent variance of the objective, both of which we assume to be unknown a priori.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: