Optimality of the Subgradient Algorithm in the Stochastic Setting

Anderson, Daron, Leith, Douglas

arXiv.org Machine Learning 

Department of Computer Science and Statistics Trinity College Dublin Ireland September 2019 Abstract Recently Jaouad Mourtada and St ephane Ga ıffas showed the anytime hedge algorithm has pseudo-regret O (log( d)/) if the cost vectors are generated by an i.i.d sequence in the cube [0, 1] d . This is remarkable because the Hedge algorithm was designed for the antagonistic setting. We prove a similar result for the anytime subgradient algorithm on the simplex. Given i.i.d cost vectors in the unit ball our pseudo-regret bound is O (1 /) and does not depend on the dimension of the problem. 1 Introduction and Related Work Online convex optimisation comes from Zinkevich [2003] who gave the following toy example: Suppose at the start of the year we must choose which of two crops to plant. Bananas grow best in warm weather and potatoes grow best in cold weather. We may plant all bananas or all potatoes or mix the two in any proportion. The average temperature for the year determines which of the two crops grows better and thus the profit for that year. Suppose also we have two neighbors, one who plants only potatoes every year and one who plants only bananas. Zincevich gave algorithms whereby after N seasons the more succesful of the two neighborhours will be only O ( N) more succesful than us. Taking averages the difference is O (1/ N) which decays to zero. Formally suppose a 1,a 2,... R d are cost vectors. On turn n we know only a 1,...,a n 1 and must select an action x n in the d-simplex S with a mind to minimising the sum null N i 1a i · x i. There exist algorithms whereby the regretnull N i 1a i · x i null N i 1a i · x has order O ( N) for all x S simultaneously. Remarkably these algorithms work when the only assumption is a uniform bound for the cost vectors. To extend the metaphor, there is a planting strategy that is effective even if the weather is selected antagonistically by nature. 1 arXiv:1909.05007v1 One popular model for predictable weather is to assume a 1,a 2,... are generated from some i.i.d sequence 1 of random variables with expectation a . In this case a 1,a 2,...,a n 1 provide estimates for a n and there are strategies where for E[a i] a the pseudo-regret null N i 1a · x i null N i 1a · x has expectation O (log N). Such algorithms exist even in the bandit setting, where on turn n we know only the sequence of costs a 1 · x 1,a 2 · x 2,...,a n 1 · x n 1 rather than the full information setting where we know the cost vectors a 1,a 2,...,a n 1 themselves. The bandit setting comes from Flaxman et al. [2004].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found