cnull 2
A complexity and generalization bounds
Combined with Theorem A.1, this yields a generalization bound for the SPO+ loss which, when Recall Theorem 3.1, the biconjugate of Lemma B.1 provides the relationship between excess SPO risk and the optimal solution of (d 1) Lemma B.4 provide a lower bound of the conditional SPO+ risk condition on the first (d 1) Lemma B.5 provide a lower bound of the conditional SPO+ risk when the distribution of By result in Lemma B.4, it holds that E By Lemma B.3, it holds that Now we present a general version of Theorem 3.1. By Lemma B.5, we know that R (d 1) Herein we provide an example to show the tightness of the lower bound in Theorem B.1. First we provide some useful properties in the following lemma. Now we provide the proofs of Lemma C.1 and C.2. Proof of Lemma C.1. In Lemma C.1 we show that c From Theorem C.2 and Lemma C.5, we know that Proposition C.2. Suppose P P Let ω = c c. Since p (c) = p (2 c c), we have E Also, α ζ (α) is a non-decreasing function.
Malign Overfitting: Interpolation Can Provably Preclude Invariance
Wald, Yoav, Yona, Gal, Shalit, Uri, Carmon, Yair
Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phenomenon of ``benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work we provide a theoretical justification for these observations. We prove that -- even in the simplest of settings -- any interpolating learning rule (with arbitrarily small margin) will not satisfy these invariance properties. We then propose and analyze an algorithm that -- in the same setting -- successfully learns a non-interpolating classifier that is provably invariant. We validate our theoretical observations on simulated data and the Waterbirds dataset.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity
Huang, Feihu, Gao, Shangqian, Pei, Jian, Huang, Heng
Zeroth-order (gradient-free) method is a class of powerful optimization tool for many machine learning problems because it only needs function values (not gradient) in the optimization. In particular, zeroth-order method is very suitable for many complex problems such as black-box attacks and bandit feedback, whose explicit gradients are difficult or infeasible to obtain. Recently, although many zeroth-order methods have been developed, these approaches still exist two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a novel fast zeroth-order stochastic alternating direction method of multipliers (ADMM) method (\emph{i.e.}, ZO-SPIDER-ADMM) with lower function query complexity for solving nonconvex problems with multiple nonsmooth penalties. Moreover, we prove that our ZO-SPIDER-ADMM has the optimal function query complexity of $O(dn + dn^{\frac{1}{2}}\epsilon^{-1})$ for finding an $\epsilon$-approximate local solution, where $n$ and $d$ denote the sample size and dimension of data, respectively. In particular, the ZO-SPIDER-ADMM improves the existing best nonconvex zeroth-order ADMM methods by a factor of $O(d^{\frac{1}{3}}n^{\frac{1}{6}})$. Moreover, we propose a fast online ZO-SPIDER-ADMM (\emph{i.e.,} ZOO-SPIDER-ADMM). Our theoretical analysis shows that the ZOO-SPIDER-ADMM has the function query complexity of $O(d\epsilon^{-\frac{3}{2}})$, which improves the existing best result by a factor of $O(\epsilon^{-\frac{1}{2}})$. Finally, we utilize a task of structured adversarial attack on black-box deep neural networks to demonstrate the efficiency of our algorithms.
- North America > United States (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)