When you walk into Erven, Nick Erven's newish restaurant in Santa Monica, you may not immediately place it as vegan. There's a kind of deli counter up front selling salads and sandwiches to go, and the guy at the host station may hand you a shot glass of sangria as a gesture of goodwill. The smell may be typical of vegan restaurants, the funk of many simmering brassicas instead of the scent of charring meat -- but the sharp angles and textile blocks of the double-height dining room seem more welcoming than they did when this space was Real Food Daily. The snacks that everybody seems to be popping into their mouths -- jet-black squares of chickpea fritter scented with yuzu; crisply fried sunchokes with what tastes like a cross between romesco sauce and ketchup; crunchy nuggets of savory deep-fried dates -- are pretty much what you would expect to taste in a sleek tasting-menu restaurant. You bite down into what looks like a doughnut hole, and although the sour, dark purée of sauerkraut and smoked apples squirts halfway across the table cloth, it is hard to see how Nick Erven has anything but pure animal pleasure on his mind.
How it got to be December is anyone's guess, but here we are, in the lull between one holiday and the next. Which means it's a great time to take a break from baking cookies and figuring out what to get Luke Walton for Christmas, and get back to exploring this town's complex and glorious restaurant scene. If you need a break from heavy holiday food, maybe try Erven, the subject of Jonathan Gold's latest review. It's vegan, it has things called "slurpables," and it has sauerkraut-stuffed doughnut holes. Yeah, yeah, I know, but Jonathan really liked them.
A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods. Here we explore the alternative approach of putting exponential weights (EW) first. We show that many standard methods and their regret bounds then follow as a special case by plugging in suitable surrogate losses and playing the EW posterior mean. For instance, we easily recover Online Gradient Descent by using EW with a Gaussian prior on linearized losses, and, more generally, all instances of Online Mirror Descent based on regular Bregman divergences also correspond to EW with a prior that depends on the mirror map. Furthermore, appropriate quadratic surrogate losses naturally give rise to Online Gradient Descent for strongly convex losses and to Online Newton Step. We further interpret several recent adaptive methods (iProd, Squint, and a variation of Coin Betting for experts) as a series of closely related reductions to exp-concave surrogate losses that are then handled by Exponential Weights. Finally, a benefit of our EW interpretation is that it opens up the possibility of sampling from the EW posterior distribution instead of playing the mean. As already observed by Bubeck and Eldan, this recovers the best-known rate in Online Bandit Linear Optimization.
Follow-the-Leader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has terrible performance for worst-case data. Other hedging strategies have better worst-case guarantees but may perform much worse than FTL if the data are not maximally adversarial. We introduce the FlipFlop algorithm, which is the first method that provably combines the best of both worlds. As part of our construction, we develop AdaHedge, which is a new way of dynamically tuning the learning rate in Hedge without using the doubling trick. AdaHedge refines a method by Cesa-Bianchi, Mansour and Stoltz (2007), yielding slightly improved worst-case guarantees. By interleaving AdaHedge and FTL, the FlipFlop algorithm achieves regret within a constant factor of the FTL regret, without sacrificing AdaHedge's worst-case guarantees. AdaHedge and FlipFlop do not need to know the range of the losses in advance; moreover, unlike earlier methods, both have the intuitive property that the issued weights are invariant under rescaling and translation of the losses. The losses are also allowed to be negative, in which case they may be interpreted as gains.
We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A recent technique promises to overcome this difficulty by maintaining multiple learning rates in parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization and the Squint algorithm for prediction with expert advice. However, in both cases the user still has to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set too small, the methods may fail completely; but if it is taken too large, performance deteriorates significantly. In the present work we remove this Lipschitz hyperparameter by designing new versions of MetaGrad and Squint that adapt to its optimal value automatically. We achieve this by dynamically updating the set of active learning rates. For MetaGrad, we further improve the computational efficiency of handling constraints on the domain of prediction, and we remove the need to specify the number of rounds in advance.