bandit optimization
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States (0.28)
- North America > Canada > British Columbia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe
We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we analyze the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization. We give theoretical guarantees for the performance of this algorithm over various classes of functions, and discuss the optimality of these results.
Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe
Quentin Berthet, Vianney Perchet
We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we analyze the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization. We give theoretical guarantees for the performance of this algorithm over various classes of functions, and discuss the optimality of these results.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
Achieving the full promise of the Thermodynamic V ariational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in V ariational Autoencoders and Sigmoid Belief Networks.
- North America > United States (0.28)
- North America > Canada > British Columbia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
Quantum Non-Linear Bandit Optimization
Siam, Zakaria Shams, Guan, Chaowen, Liu, Chong
We study non-linear bandit optimization where the learner maximizes a black-box function with zeroth order function oracle, which has been successfully applied in many critical applications such as drug discovery and hyperparameter tuning. Existing works have showed that with the aid of quantum computing, it is possible to break the $\Omega(\sqrt{T})$ regret lower bound in classical settings and achieve the new $O(\mathrm{poly}\log T)$ upper bound. However, they usually assume that the objective function sits within the reproducing kernel Hilbert space and their algorithms suffer from the curse of dimensionality. In this paper, we propose the new Q-NLB-UCB algorithm which uses the novel parametric function approximation technique and enjoys performance improvement due to quantum fast-forward and quantum Monte Carlo mean estimation. We prove that the regret bound of Q-NLB-UCB is not only $O(\mathrm{poly}\log T)$ but also input dimension-free, making it applicable for high-dimensional tasks. At the heart of our analyses are a new quantum regression oracle and a careful construction of parameter uncertainty region. Our algorithm is also validated for its efficiency on both synthetic and real-world tasks.
- North America > United States > New York > Albany County > Albany (0.14)
- North America > United States > Wisconsin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Research Report (0.64)
- Instructional Material > Course Syllabus & Notes (0.46)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
Reviews: Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe
This is a very interesting paper at the intersection of bandit problems and stochastic optimization. The authors consider the setup where at each time step a decision maker takes an action and gets as a feedback a gradient estimate at the chosen point. The gradient estimate is noisy but the assumption is that we have control over how noisy the gradient estimate is. The challenge for the decision maker is to make a sequence of moves such that the average of all iterates has low error compared to the optima. So the goal is similar to the goal in stochastic optimization, but unlike stochastic optimization (but like bandit problems) one gets to see limited information about the loss function.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (3 more...)
Gaussian Process Bandit Optimization with Few Batches
In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched finite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound $O^\ast(\sqrt{T\gamma_T})$ using $O(\log\log T)$ batches within time horizon $T$, where the $O^\ast(\cdot)$ notation hides dimension-independent logarithmic factors and $\gamma_T$ is the maximum information gain associated with the kernel. This bound is near-optimal for several kernels of interest and improves on the typical $O^\ast(\sqrt{T}\gamma_T)$ bound, and our approach is arguably the simplest among algorithms attaining this improvement. In addition, in the case of a constant number of batches (not depending on $T$), we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches, focusing on the squared exponential and Mat\'ern kernels. The algorithmic upper bounds are shown to be nearly minimax optimal via analogous algorithm-independent lower bounds.