We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios. Papers published at the Neural Information Processing Systems Conference.
Automated decision-making is one of the core objectives of artificial intelligence. Not surprisingly, over the past few years, entire new research fields have emerged to tackle that task. This blog post is concerned with regret minimization, one of the central tools in online learning. Regret minimization models the problem of repeated online decision making: an agent is called to make a sequence of decisions, under unknown (and potentially adversarial) loss functions. Regret minimization is a versatile mathematical abstraction, that has found a plethora of practical applications: portfolio optimization, computation of Nash equilibria, applications to markets and auctions, submodular function optimization, and more.
Tiger Woods has one regret, and it's not the 2009 cheating scandal that led to the end of his marriage to Elin Nordegren. The 40-year-old professional golfer revealed the one thing that he wishes he could've changed, during an hour-long interview with Charlie Rose on Thursday. "The only regret I have in life is not spending another year at Stanford [University]," Woods told Rose. "I wish I would've had one more year." WATCH: 'Cheater' Banner Flies Over Tiger Woods During U.S. Open.
We study optimal regret bounds for control in linear dynamical systems under adversarially changing strongly convex cost functions, given the knowledge of transition dynamics. This includes several well studied and influential frameworks such as the Kalman filter and the linear quadratic regulator. State of the art methods achieve regret which scales as T 0.5, where T is the time horizon. We show that the optimal regret in this fundamental setting can be significantly smaller, scaling as polylog(T). This regret bound is achieved by two different efficient iterative methods, online gradient descent and online natural gradient.