Frank, Natalie S.
Adversarial Consistency and the Uniqueness of the Adversarial Bayes Classifier
Frank, Natalie S.
Minimizing an adversarial surrogate risk is a common technique for learning robust classifiers. Prior work showed that convex surrogate losses are not statistically consistent in the adversarial context-- or in other words, a minimizing sequence of the adversarial surrogate risk will not necessarily minimize the adversarial classification error. We connect the consistency of adversarial surrogate losses to properties of minimizers to the adversarial classification risk, known as adversarial Bayes classifiers. Specifically, under reasonable distributional assumptions, a convex surrogate loss is statistically consistent for adversarial learning iff the adversarial Bayes classifier satisfies a certain notion of uniqueness.
Existence and Minimax Theorems for Adversarial Surrogate Risks in Binary Classification
Frank, Natalie S., Niles-Weed, Jonathan
Adversarial training is one of the most popular methods for training methods robust to adversarial attacks, however, it is not well-understood from a theoretical perspective. We prove and existence, regularity, and minimax theorems for adversarial surrogate risks. Our results explain some empirical observations on adversarial robustness from prior work and suggest new directions in algorithm development. Furthermore, our results extend previously known existence and minimax theorems for the adversarial classification risk to surrogate risks.
Faithful and Efficient Explanations for Neural Networks via Neural Tangent Kernel Surrogate Models
Engel, Andrew, Wang, Zhichao, Frank, Natalie S., Dumitriu, Ioana, Choudhury, Sutanay, Sarwate, Anand, Chiang, Tony
A recent trend in explainable AI research has focused on surrogate modeling, where neural networks are approximated as simpler ML algorithms such as kernel machines. A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks to investigate a diverse set of neural network behavior. In this work, we combine these two trends to analyze approximate empirical neural tangent kernels (eNTK) for data attribution. Approximation is critical for eNTK analysis due to the high computational cost to compute the eNTK. We define new approximate eNTK and perform novel analysis on how well the resulting kernel machine surrogate models correlate with the underlying neural network. We introduce two new random projection variants of approximate eNTK which allow users to tune the time and memory complexity of their calculation. We conclude that kernel machines using approximate neural tangent kernel as the kernel function are effective surrogate models, with the introduced trace NTK the most consistent performer.
The Consistency of Adversarial Training for Binary Classification
Frank, Natalie S., Niles-Weed, Jonathan
Robustness to adversarial perturbations is of paramount concern in modern machine learning. One of the state-of-the-art methods for training robust classifiers is adversarial training, which involves minimizing a supremum-based surrogate risk. The statistical consistency of surrogate risks is well understood in the context of standard machine learning, but not in the adversarial setting. In this paper, we characterize which supremum-based surrogates are consistent for distributions absolutely continuous with respect to Lebesgue measure in binary classification. Furthermore, we obtain quantitative bounds relating adversarial surrogate risks to the adversarial classification risk. Lastly, we discuss implications for the $\cH$-consistency of adversarial training.
On the Existence of the Adversarial Bayes Classifier (Extended Version)
Awasthi, Pranjal, Frank, Natalie S., Mohri, Mehryar
Adversarial robustness is a critical property in a variety of modern machine learning applications. While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open. In this work, we study a fundamental question regarding Bayes optimality for adversarial robustness. We provide general sufficient conditions under which the existence of a Bayes optimal classifier can be guaranteed for adversarial robustness. Our results can provide a useful tool for a subsequent study of surrogate losses in adversarial robustness and their consistency properties. This manuscript is the extended version of the paper On the Existence of the Adversarial Bayes Classifier published in NeurIPS (Awasthi et al., 2021b). The results of the original paper did not apply to some non-strictly convex norms. Here we extend our results to all possible norms.