Attias, Idan, Hanneke, Steve, Mansour, Yishay

We study the problem of semi-supervised learning of an adversarially-robust predictor in the PAC model, where the learner has access to both labeled and unlabeled examples. The sample complexity in semi-supervised learning has two parameters, the number of labeled examples and the number of unlabeled examples. We consider the complexity measures, $VC_U \leq dim_U \leq VC$ and $VC^*$, where $VC$ is the standard $VC$-dimension, $VC^*$ is its dual, and the other two measures appeared in Montasser et al. (2019). The best sample bound known for robust supervised PAC learning is $O(VC \cdot VC^*)$, and we will compare our sample bounds to $\Lambda$ which is the minimal number of labeled examples required by any robust supervised PAC learning algorithm. Our main results are the following: (1) in the realizable setting it is sufficient to have $O(VC_U)$ labeled examples and $O(\Lambda)$ unlabeled examples. (2) In the agnostic setting, let $\eta$ be the minimal agnostic error. The sample complexity depends on the resulting error rate. If we allow an error of $2\eta+\epsilon$, it is still sufficient to have $O(VC_U)$ labeled examples and $O(\Lambda)$ unlabeled examples. If we insist on having an error $\eta+\epsilon$ then $\Omega(dim_U)$ labeled examples are necessary, as in the supervised case. The above results show that there is a significant benefit in semi-supervised robust learning, as there are hypothesis classes with $VC_U=0$ and $dim_U$ arbitrary large. In supervised learning, having access only to labeled examples requires at least $\Lambda \geq dim_U$ labeled examples. Semi-supervised require only $O(1)$ labeled examples and $O(\Lambda)$ unlabeled examples. A byproduct of our result is that if we assume that the distribution is robustly realizable by a hypothesis class, then with respect to the 0-1 loss we can learn with only $O(VC_U)$ labeled examples, even if the $VC$ is infinite.

Montasser, Omar, Hanneke, Steve, Srebro, Nathan

We study the problem of learning predictors that are robust to adversarial examples with respect to an unknown perturbation set, relying instead on interaction with an adversarial attacker or access to attack oracles, examining different models for such interactions. We obtain upper bounds on the sample complexity and upper and lower bounds on the number of required interactions, or number of successful attacks, in different interaction models, in terms of the VC and Littlestone dimensions of the hypothesis class of predictors, and without any assumptions on the perturbation set.

Blum, Avrim, Montasser, Omar, Shakhnarovich, Greg, Zhang, Hongyang

We present an oracle-efficient algorithm for boosting the adversarial robustness of barely robust learners. Barely robust learning algorithms learn predictors that are adversarially robust only on a small fraction $\beta \ll 1$ of the data distribution. Our proposed notion of barely robust learning requires robustness with respect to a "larger" perturbation set; which we show is necessary for strongly robust learning, and that weaker relaxations are not sufficient for strongly robust learning. Our results reveal a qualitative and quantitative equivalence between two seemingly unrelated problems: strongly robust learning and barely robust learning.

Montasser, Omar, Hanneke, Steve, Srebro, Nathan

We study the question of learning an adversarially robust predictor. We show that any hypothesis class $\mathcal{H}$ with finite VC dimension is robustly PAC learnable with an improper learning rule. The requirement of being improper is necessary as we exhibit examples of hypothesis classes $\mathcal{H}$ with finite VC dimension that are not robustly PAC learnable with any proper learning rule.

Goldwasser, Shafi, Kalai, Adam Tauman, Kalai, Yael Tauman, Montasser, Omar

We present a transductive learning algorithm that takes as input training examples from a distribution $P$ and arbitrary (unlabeled) test examples, possibly chosen by an adversary. This is unlike prior work that assumes that test examples are small perturbations of $P$. Our algorithm outputs a selective classifier, which abstains from predicting on some examples. By considering selective transductive learning, we give the first nontrivial guarantees for learning classes of bounded VC dimension with arbitrary train and test distributions---no prior guarantees were known even for simple classes of functions such as intervals on the line. In particular, for any function in a class $C$ of bounded VC dimension, we guarantee a low test error rate and a low rejection rate with respect to $P$. Our algorithm is efficient given an Empirical Risk Minimizer (ERM) for $C$. Our guarantees hold even for test examples chosen by an unbounded white-box adversary. We also give guarantees for generalization, agnostic, and unsupervised settings.