Gradient-based Adversarial Attacks against Text Transformers
Guo, Chuan, Sablayrolles, Alexandre, Jégou, Hervé, Kiela, Douwe
–arXiv.org Artificial Intelligence
We propose the first general-purpose gradient-based attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks. Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.
arXiv.org Artificial Intelligence
Apr-15-2021
- Country:
- North America > United States
- New York (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Europe
- Asia
- Middle East > Republic of Türkiye (0.05)
- Russia (0.04)
- India (0.04)
- China (0.04)
- North America > United States
- Genre:
- Research Report (0.82)
- Industry:
- Information Technology > Security & Privacy (0.83)
- Government (0.65)
- Technology: