Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games
Balcan, Maria-Florina, Pukdee, Rattana, Ravikumar, Pradeep, Zhang, Hongyang
–arXiv.org Artificial Intelligence
Adversarial training is a standard technique for training adversarially robust models. In this paper, we study adversarial training as an alternating best-response strategy in a 2-player zero-sum game. We prove that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy of such game may not converge. On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust. We support our theoretical results with experiments, showing the non-convergence of adversarial training and the robustness of Nash equilibrium.
arXiv.org Artificial Intelligence
Feb-27-2023
- Country:
- Europe (0.28)
- Genre:
- Research Report (0.50)
- Industry:
- Leisure & Entertainment > Games (0.46)
- Technology: