Taming Adversarial Robustness via Abstaining
Makdah, Abed AlRahman Al, Katewa, Vaibhav, Pasqualetti, Fabio
In this work, we consider a binary classification problem and cast it into a binary hypothesis testing framework, where the observations can be perturbed by an adversary. To improve the adversarial robustness of a classifier, we include an abstaining option, where the classifier abstains from taking a decision when it has low confidence about the prediction. We propose metrics to quantify the nominal performance of a classifier with abstaining option and its robustness against adversarial perturbations. We show that there exist a tradeoff between the two metrics regardless of what method is used to choose the abstaining region. Our results imply that the robustness of a classifier with abstaining can only be improved at the expense of its nominal performance. Further, we provide necessary conditions to design the abstaining region for a 1-dimensional binary classification problem. We validate our theoretical results on the MNIST dataset, where we numerically show that the tradeoff between performance and robustness also exist for the general multi-class classification problems.
Apr-6-2021
- Country:
- Oceania > Australia
- North America
- Asia
- Middle East > Jordan (0.04)
- India > Karnataka
- Bengaluru (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Transportation (0.93)
- Health & Medicine (0.93)
- Technology: