We describe a novel algorithm for noisy global optimisation and continuum-armed bandits, with good convergence properties over any continuous reward function having finitely many polynomial maxima. Over such functions, our algorithm achieves square-root regret in bandits, and inverse-square-root error in optimisation, without prior information. Our algorithm works by reducing these problems to tree-armed bandits, and we also provide new results in this setting. We show it is possible to adaptively combine multiple trees so as to minimise the regret, and also give near-matching lower bounds on the regret in terms of the zooming dimension.

Ho, Chien-Ju, Slivkins, Aleksandrs, Vaughan, Jennifer Wortman

Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of "bonus" payments. In this paper, we study the requester's problem of dynamically adjusting quality-contingent payments for tasks. We consider a multi-round version of the well-known principal-agent model, whereby in each round a worker makes a strategic choice of the effort level which is not directly observable by the requester. In particular, our formulation significantly generalizes the budget-free online task pricing problems studied in prior work. We treat this problem as a multi-armed bandit problem, with each "arm" representing a potential contract. To cope with the large (and in fact, infinite) number of arms, we propose a new algorithm, AgnosticZooming, which discretizes the contract space into a finite number of regions, effectively treating each region as a single arm. This discretization is adaptively refined, so that more promising regions of the contract space are eventually discretized more finely. We analyze this algorithm, showing that it achieves regret sublinear in the time horizon and substantially improves over non-adaptive discretization (which is the only competing approach in the literature). Our results advance the state of art on several different topics: the theory of crowdsourcing markets, principal-agent problems, multi-armed bandits, and dynamic pricing.

Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several books and surveys. This book provides a more introductory, textbook-like treatment of the subject. Each chapter tackles a particular line of work, providing a self-contained, teachable technical introduction and a review of the more advanced results. The chapters are as follows: Stochastic bandits; Lower bounds; Bayesian Bandits and Thompson Sampling; Lipschitz Bandits; Full Feedback and Adversarial Costs; Adversarial Bandits; Linear Costs and Semi-bandits; Contextual Bandits; Bandits and Zero-Sum Games; Bandits with Knapsacks; Incentivized Exploration and Connections to Mechanism Design.

Koren, Tomer, Livni, Roi, Mansour, Yishay

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde(\max\set{\mathcal{C}^{1/3}T^{2/3},\sqrt{kT}})$, and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetilde{\Theta}(\max\set{k^{1/3}T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(k)$, and (ii) the interval metric with regret $\widetilde{\Theta}(\max\set{T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(1)$. For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of $\widetilde{\Theta}(T^{\frac{d+1}{d+2}})$ where $d \ge 1$ is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.

We consider the Lipschitz bandit optimization problem with an emphasis on practical efficiency. Although there is rich literature on regret analysis of this type of problem, e.g., [Kleinberg et al. 2008, Bubeck et al. 2011, Slivkins 2014], their proposed algorithms suffer from serious practical problems including extreme time complexity and dependence on oracle implementations. With this motivation, we propose a novel algorithm with an Upper Confidence Bound (UCB) exploration, namely Tree UCB-Hoeffding, using adaptive partitions. Our partitioning scheme is easy to implement and does not require any oracle settings. With a tree-based search strategy, the total computational cost can be improved to $\mathcal{O}(T\log T)$ for the first $T$ iterations. In addition, our algorithm achieves the regret lower bound up to a logarithmic factor.