Andriushchenko, Maksym
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Mosbach, Marius, Andriushchenko, Maksym, Klakow, Dietrich
Fine-tuning pre-trained transformer-based language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks. Despite the strong empirical performance of fine-tuned models, fine-tuning is an unstable process: training the same model with multiple random seeds can result in a large variance of the task performance. Previous literature (Devlin et al., 2019; Lee et al., 2020; Dodge et al., 2020) identified two potential reasons for the observed instability: catastrophic forgetting and small size of the fine-tuning datasets. In this paper, we show that both hypotheses fail to explain the fine-tuning instability. We analyze BERT, RoBERTa, and ALBERT, finetuned on three commonly used datasets from the GLUE benchmark, and show that the observed instability is caused by optimization difficulties that lead to vanishing gradients. Additionally, we show that the remaining variance of the downstream task performance can be attributed to differences in generalization where fine-tuned models with the same training loss exhibit noticeably different test performance. Based on our analysis, we present a simple but strong baseline that makes fine-tuning BERTbased models significantly more stable than the previously proposed approaches. Pre-trained transformer-based masked language models such as BERT (Devlin et al., 2019), RoBERTa (Liu et al., 2019), and ALBERT (Lan et al., 2020) have had a dramatic impact on the NLP landscape in the recent year. The standard recipe for using such models typically involves training a pretrained model for a few epochs on a supervised downstream dataset, which is known as fine-tuning.
Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks
Croce, Francesco, Andriushchenko, Maksym, Singh, Naman D., Flammarion, Nicolas, Hein, Matthias
A large body of research has focused on adversarial attacks which require to modify all input features with small $l_2$- or $l_\infty$-norms. In this paper we instead focus on query-efficient sparse attacks in the black-box setting. Our versatile framework, Sparse-RS, based on random search achieves state-of-the-art success rate and query efficiency for different sparse attack models such as $l_0$-bounded perturbations (outperforming established white-box methods), adversarial patches, and adversarial framing. We show the effectiveness of Sparse-RS on different datasets considering problems from image recognition and malware detection and multiple variations of sparse threat models, including targeted and universal perturbations. In particular Sparse-RS can be used for realistic attacks such as universal adversarial patch attacks without requiring a substitute model. The code of our framework is available at https://github.com/fra31/sparse-rs.
Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks
Andriushchenko, Maksym, Hein, Matthias
The problem of adversarial samples has been studied extensively for neural networks. However, for boosting, in particular boosted decision trees and decision stumps there are almost no results, even though boosted decision trees, as e.g. XGBoost, are quite popular due to their interpretability and good prediction performance. We show in this paper that for boosted decision stumps the exact min-max optimal robust loss and test error for an $l_\infty$-attack can be computed in $O(n\,T\log T)$, where $T$ is the number of decision stumps and $n$ the number of data points, as well as an optimal update of the ensemble in $O(n^2\,T\log T)$. While not exact, we show how to optimize an upper bound on the robust loss for boosted trees. Up to our knowledge, these are the first algorithms directly optimizing provable robustness guarantees in the area of boosting. We make the code of all our experiments publicly available at https://github.com/max-andr/provably-robust-boosting
Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem
Hein, Matthias, Andriushchenko, Maksym, Bitterwolf, Julian
Classifiers used in the wild, in particular for safety-critical systems, should not only have good generalization properties but also should know when they don't know, in particular make low confidence predictions far away from the training data. We show that ReLU type neural networks which yield a piecewise linear classifier function fail in this regard as they produce almost always high confidence predictions far away from the training data. For bounded domains like images we propose a new robust optimization technique similar to adversarial training which enforces low confidence predictions far away from the training data. We show that this technique is surprisingly effective in reducing the confidence of predictions far away from the training data while maintaining high confidence predictions and similar test error on the original classification task compared to standard training.
Logit Pairing Methods Can Fool Gradient-Based Attacks
Mosbach, Marius, Andriushchenko, Maksym, Trost, Thomas, Hein, Matthias, Klakow, Dietrich
Recently, several logit regularization methods have been proposed in [Kannan et al., 2018] to improve the adversarial robustness of classifiers. We show that the proposed computationally fast methods - Clean Logit Pairing (CLP) and Logit Squeezing (LSQ) - just make the gradient-based optimization problem of crafting adversarial examples harder, without providing actual robustness. For Adversarial Logit Pairing (ALP) we find that it can give indeed robustness against adversarial examples and we study it in different settings. Especially, we show that ALP may provide additional robustness when combined with adversarial training. However, the increase is much smaller than claimed by [Kannan et al., 2018]. Finally, our results suggest that evaluation against an iterative PGD attack relies heavily on the parameters used and may result in false conclusions regarding the robustness.
Provable Robustness of ReLU networks via Maximization of Linear Regions
Croce, Francesco, Andriushchenko, Maksym, Hein, Matthias
It has been shown that neural network classifiers are not robust. This raises concerns about their usage in safety-critical systems. We propose in this paper a regularization scheme for ReLU networks which provably improves the robustness of the classifier by maximizing the linear region of the classifier as well as the distance to the decision boundary. Our techniques allow even to find the minimal adversarial perturbation for a fraction of test points for large networks. In the experiments we show that our approach improves upon adversarial training both in terms of lower and upper bounds on the robustness and is comparable or better than the state of the art in terms of test error and robustness.
Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation
Hein, Matthias, Andriushchenko, Maksym
Recent work has shown that state-of-the-art classifiers are quite brittle, in the sense that a small adversarial change of an originally with high confidence correctly classified input leads to a wrong classification again with high confidence. This raises concerns that such classifiers are vulnerable to attacks and calls into question their usage in safety-critical systems. We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific \emph{lower bounds} on the norm of the input manipulation required to change the classifier decision. Based on this analysis we propose the Cross-Lipschitz regularization functional. We show that using this form of regularization in kernel methods resp. neural networks improves the robustness of the classifier without any loss in prediction performance.
Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation
Hein, Matthias, Andriushchenko, Maksym
Recent work has shown that state-of-the-art classifiers are quite brittle, in the sense that a small adversarial change of an originally with high confidence correctly classified input leads to a wrong classification again with high confidence. This raises concerns that such classifiers are vulnerable to attacks and calls into question their usage in safety-critical systems. We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific lower bounds on the norm of the input manipulation required to change the classifier decision. Based on this analysis we propose the Cross-Lipschitz regularization functional. We show that using this form of regularization in kernel methods resp. neural networks improves the robustness of the classifier without any loss in prediction performance.