Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Sep-25-2020–arXiv.org Machine Learning

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

artificial intelligence, exploratory distribution, machine learning, (15 more...)

arXiv.org Machine Learning

Sep-25-2020

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Research Report (0.40)
- Workflow (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found