Gradient-based Adversarial Attacks against Text Transformers
Guo, Chuan, Sablayrolles, Alexandre, Jégou, Hervé, Kiela, Douwe
–arXiv.org Artificial Intelligence
We propose the first general-purpose gradient-based attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks. Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.
arXiv.org Artificial Intelligence
Apr-15-2021
- Country:
- Asia (0.47)
- Europe (0.68)
- North America > United States
- Minnesota > Hennepin County > Minneapolis (0.14)
- Genre:
- Research Report (0.82)
- Industry:
- Government (0.65)
- Information Technology > Security & Privacy (0.83)
- Technology: