Blanchard, Moïse
Agnostic Smoothed Online Learning
Blanchard, Moïse
Classical results in statistical learning typically consider two extreme data-generating models: i.i.d. instances from an unknown distribution, or fully adversarial instances, often much more challenging statistically. To bridge the gap between these models, recent work introduced the smoothed framework, in which at each iteration an adversary generates instances from a distribution constrained to have density bounded by $\sigma^{-1}$ compared to some fixed base measure $\mu$. This framework interpolates between the i.i.d. and adversarial cases, depending on the value of $\sigma$. For the classical online prediction problem, most prior results in smoothed online learning rely on the arguably strong assumption that the base measure $\mu$ is known to the learner, contrasting with standard settings in the PAC learning or consistency literature. We consider the general agnostic problem in which the base measure is unknown and values are arbitrary. Along this direction, Block et al. showed that empirical risk minimization has sublinear regret under the well-specified assumption. We propose an algorithm R-Cover based on recursive coverings which is the first to guarantee sublinear regret for agnostic smoothed online learning without prior knowledge of $\mu$. For classification, we prove that R-Cover has adaptive regret $\tilde O(\sqrt{dT/\sigma})$ for function classes with VC dimension $d$, which is optimal up to logarithmic factors. For regression, we establish that R-Cover has sublinear oblivious regret for function classes with polynomial fat-shattering dimension growth.
Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes
Blanchard, Moïse, Zhang, Junhui, Jaillet, Patrick
We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius $\epsilon$ with a separation oracle in dimension $d$ -- or to minimize $1$-Lipschitz convex functions to accuracy $\epsilon$ over the unit ball -- our algorithms use $\mathcal O(\frac{d^2}{p}\ln \frac{1}{\epsilon})$ bits of memory, and make $\mathcal O((C\frac{d}{p}\ln \frac{1}{\epsilon})^p)$ oracle calls, for some universal constant $C \geq 1$. The family is parametrized by $p\in[d]$ and provides an oracle-complexity/memory trade-off in the sub-polynomial regime $\ln\frac{1}{\epsilon}\gg\ln d$. While several works gave lower-bound trade-offs (impossibility results) -- we explicit here their dependence with $\ln\frac{1}{\epsilon}$, showing that these also hold in any sub-polynomial regime -- to the best of our knowledge this is the first class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $\epsilon\leq 1/\sqrt d$. The algorithms divide the $d$ variables into $p$ blocks and optimize over blocks sequentially, with approximate separation vectors constructed using a variant of Vaidya's method. In the regime $\epsilon \leq d^{-\Omega(d)}$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
Universal Regression with Adversarial Responses
Blanchard, Moïse, Jaillet, Patrick
We provide algorithms for regression with adversarial responses under large classes of non-i.i.d. instance sequences, on general separable metric spaces, with provably minimal assumptions. We also give characterizations of learnability in this regression context. We consider universal consistency which asks for strong consistency of a learner without restrictions on the value responses. Our analysis shows that such an objective is achievable for a significantly larger class of instance sequences than stationary processes, and unveils a fundamental dichotomy between value spaces: whether finite-horizon mean estimation is achievable or not. We further provide optimistically universal learning rules, i.e., such that if they fail to achieve universal consistency, any other algorithms will fail as well. For unbounded losses, we propose a mild integrability condition under which there exist algorithms for adversarial regression under large classes of non-i.i.d. instance sequences. In addition, our analysis also provides a learning rule for mean estimation in general metric spaces that is consistent under adversarial responses without any moment conditions on the sequence, a result of independent interest.
Quadratic Memory is Necessary for Optimal Query Complexity in Convex Optimization: Center-of-Mass is Pareto-Optimal
Blanchard, Moïse, Zhang, Junhui, Jaillet, Patrick
We give query complexity lower bounds for convex optimization and the related feasibility problem. We show that quadratic memory is necessary to achieve the optimal oracle complexity for first-order convex optimization. In particular, this shows that center-of-mass cutting-planes algorithms in dimension $d$ which use $\tilde O(d^2)$ memory and $\tilde O(d)$ queries are Pareto-optimal for both convex optimization and the feasibility problem, up to logarithmic factors. Precisely, we prove that to minimize $1$-Lipschitz convex functions over the unit ball to $1/d^4$ accuracy, any deterministic first-order algorithms using at most $d^{2-\delta}$ bits of memory must make $\tilde\Omega(d^{1+\delta/3})$ queries, for any $\delta\in[0,1]$. For the feasibility problem, in which an algorithm only has access to a separation oracle, we show a stronger trade-off: for at most $d^{2-\delta}$ memory, the number of queries required is $\tilde\Omega(d^{1+\delta})$. This resolves a COLT 2019 open problem of Woodworth and Srebro.
Universal Online Learning: an Optimistically Universal Learning Rule
Blanchard, Moïse
We study the subject of universal online learning with non-i.i.d. processes for bounded losses. The notion of an universally consistent learning was defined by Hanneke in an effort to study learning theory under minimal assumptions, where the objective is to obtain low long-run average loss for any target function. We are interested in characterizing processes for which learning is possible and whether there exist learning rules guaranteed to be universally consistent given the only assumption that such learning is possible. The case of unbounded losses is very restrictive, since the learnable processes almost surely visit a finite number of points and as a result, simple memorization is optimistically universal. We focus on the bounded setting and give a complete characterization of the processes admitting strong and weak universal learning. We further show that k-nearest neighbor algorithm (kNN) is not optimistically universal and present a novel variant of 1NN which is optimistically universal for general input and value spaces in both strong and weak setting. This closes all COLT 2021 open problems posed by Hanneke on universal online learning.