Zeroth Order Non-convex optimization with Dueling-Choice Bandits

Xu, Yichong, Joshi, Aparna, Singh, Aarti, Dubrawski, Artur

Nov-3-2019–arXiv.org Machine Learning

We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform a constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\frac{\Phi}{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $\Phi$ is an improved information gain corresponding to a comparison based constraint set that restricts the search space for the optimum. In contrast, in the direct query only setting, $\Phi$ depends on the entire domain. Finally, we present experimental results to show the efficacy of our algorithm.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Machine Learning

Nov-3-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Sweden
  - Stockholm > Stockholm (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.66)
  - Machine Learning > Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found