Learning Management
Reviews: Efficient Second Order Online Learning by Sketching
The present work takes a significant step in addressing this. The primary contribution of the paper are variations of Online Newton Step that remove this drawback using a sketching approximation to the scaling matrix and a clever implementation of sparse updates. The primary theoretical contributions are the analysis of the RP and FD versions of the algorithm. For RP they show a regret bound which holds when the matrix G_T (the matrix of observed gradients) is actually low-rank. Given the structure of the loss functions assumed, f_t(w) \ell( w, x_t), gradients will always be in the direction of the examples x_t, and so I think this theorem only holds when the data is actually low-rank.
Reviews: MetaGrad: Multiple Learning Rates in Online Learning
Online convex minimization is a fundamental problem in learning theory with various applications. Here, we are given a sequence of convex functions f1,...,fT in an online fashion and we seek to predict the minimizer of the sum so as to minimize the regret. This is a very well studied question with various guarantees known even when only given gradient information about the functions fi. For general functions this leads to a regret of sqrt(T) and for some special cases, we know better guarantees: O(log T) for strongly convex functions, O(d log T) for exp-convex functions etc. However, the algorithms that achieve the latter better guarantees are quite different and in some cases need to know bounds on the specific parameters of the functions (e.g., strong-convexity of the fi's, although this can now be avoided).
Online Learning under Adversarial Nonlinear Constraints
In many applications, learning systems are required to process continuous non-stationary data streams.We study this problem in an online learning framework and propose an algorithm that can deal with adversarial time-varying and nonlinear constraints.As we show in our work, the algorithm called Constraint Violation Velocity Projection (CVV-Pro) achieves \sqrt{T} regret and converges to the feasible set at a rate of 1/\sqrt{T}, despite the fact that the feasible set is slowly time-varying and a priori unknown to the learner. CVV-Pro only relies on local sparse linear approximations of the feasible set and therefore avoids optimizing over the entire set at each iteration, which is in sharp contrast to projected gradients or Frank-Wolfe methods. We also empirically evaluate our algorithm on two-player games, where the players are subjected to a shared constraint.
Optimal Learners for Realizable Regression: PAC Learning and Online Learning
In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon 1997 (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context.
Towards Best-of-All-Worlds Online Learning with Feedback Graphs
We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph G over the available actions. We develop an algorithm that simultaneously achieves regret bounds of the form: O(\sqrt{\theta(G) T}) with adversarial losses; O(\theta(G)\mathrm{polylog}{T}) with stochastic losses; and O(\theta(G)\mathrm{polylog}{T} \sqrt{\theta(G) C}) with stochastic losses subject to C adversarial corruptions. Here, \theta(G) is the clique covering number of the graph G . Our algorithm is an instantiation of Follow-the-Regularized-Leader with a novel regularization that can be seen as a product of a Tsallis entropy component (inspired by Zimmert and Seldin (2019)) and a Shannon entropy component (analyzed in the corrupted stochastic case by Amir et al. (2020)), thus subtly interpolating between the two forms of entropies. One of our key technical contributions is in establishing the convexity of this regularizer and controlling its inverse Hessian, despite its complex product structure.
No-regret Online Learning over Riemannian Manifolds
We consider online optimization over Riemannian manifolds, where a learner attempts to minimize a sequence of time-varying loss functions defined on Riemannian manifolds. Though many Euclidean online convex optimization algorithms have been proven useful in a wide range of areas, less attention has been paid to their Riemannian counterparts. In this paper, we study Riemannian online gradient descent (R-OGD) on Hadamard manifolds for both geodesically convex and strongly geodesically convex loss functions, and Riemannian bandit algorithm (R-BAN) on Hadamard homogeneous manifolds for geodesically convex functions. We establish upper bounds on the regrets of the problem with respect to time horizon, manifold curvature, and manifold dimension. We also find a universal lower bound for the achievable regret by constructing an online convex optimization problem on Hadamard manifolds.
Riemannian Projection-free Online Learning
The projection operation is a critical component in a wide range of optimization algorithms, such as online gradient descent (OGD), for enforcing constraints and achieving optimal regret bounds. However, it suffers from computational complexity limitations in high-dimensional settings or when dealing with ill-conditioned constraint sets. Projection-free algorithms address this issue by replacing the projection oracle with more efficient optimization subroutines. But to date, these methods have been developed primarily in the Euclidean setting, and while there has been growing interest in optimization on Riemannian manifolds, there has been essentially no work in trying to utilize projection-free tools here. An apparent issue is that non-trivial affine functions are generally non-convex in such domains. In this paper, we present methods for obtaining sub-linear regret guarantees in online geodesically convex optimization on curved spaces for two scenarios: when we have access to (a) a separation oracle or (b) a linear optimization oracle.
Online Learning and Control of Complex Dynamical Systems from Sensory Input
Identifying an effective model of a dynamical system from sensory data and using it for future state prediction and control is challenging. Recent data-driven algorithms based on Koopman theory are a promising approach to this problem, but they typically never update the model once it has been identified from a relatively small set of observation, thus making long-term prediction and control difficult for realistic systems, in robotics or fluid mechanics for example. This paper introduces a novel method for learning an embedding of the state space with linear dynamics from sensory data. Unlike previous approaches, the dynamics model can be updated online and thus easily applied to systems with non-linear dynamics in the original configuration space. The proposed approach is evaluated empirically on several classical dynamical systems and sensory modalities, with good performance on long-term prediction and control.
Adversarial Attacks on Online Learning to Rank with Click Feedback
Online learning to rank (OLTR) is a sequential decision-making problem where a learning agent selects an ordered list of items and receives feedback through user clicks. Although potential attacks against OLTR algorithms may cause serious losses in real-world applications, there is limited knowledge about adversarial attacks on OLTR. This paper studies attack strategies against multiple variants of OLTR. Our first result provides an attack strategy against the UCB algorithm on classical stochastic bandits with binary feedback, which solves the key issues caused by bounded and discrete feedback that previous works cannot handle. Finally, we propose a general attack strategy against any algorithm under the general click model.
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach
In this paper, we propose an online convex optimization approach with two different levels of adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures of the online functions, while at a lower level, it can exploit the unknown niceness of the environments and attain problem-dependent guarantees. Our result not only safeguards the worst-case guarantees but also directly implies the small-loss bounds in analysis. Moreover, when applied to adversarial/stochastic convex optimization and game theory problems, our result enhances the existing universal guarantees. Our approach is based on a multi-layer online ensemble framework incorporating novel ingredients, including a carefully designed optimism for unifying diverse function types and cascaded corrections for algorithmic stability.