On the Power of Abstention and Data-Driven Decision Making for Adversarial Robustness
Balcan, Maria-Florina, Blum, Avrim, Sharma, Dravyansh, Zhang, Hongyang
What these results have in common is that changes that either are imperceptible or should be irrelevant to the classification task can lead to drastically different network behavior. One reason for this vulnerability to adversarial attack is the non-Lipschitzness property of typical neural networks: small but adversarial movements in the input space can often produce large perturbations in the feature space. In this work, we consider the question of whether non-Lipschitz networks are intrinsically vulnerable, or if they could still be made robust to adversarial attack, in an abstract but (we believe) instructive adversarial model. In particular, suppose an adversary, by making an imperceptible change to an input x, can cause its representation F (x) in feature space (the penultimate layer of the network) to move by an arbitrary amount: will such an adversary always win? Clearly if the adversary can modify F (x) by an arbitrary amount in an arbitrary direction, then yes. But what if the adversary can modify F (x) by an arbitrary amount but only in a random direction (which it cannot control)? In this case, we show an interesting dichotomy: if the classifier must output a classification on any input it is given, then yes the adversary will still win, no matter how well-separated the classes are in feature space and no matter what decision surface the classifier uses.
Oct-12-2020
- Country:
- North America > United States
- Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Security & Privacy (0.54)
- Government > Military (0.54)
- Technology: