Language-Driven Anchors for Zero-Shot Adversarial Robustness
Li, Xiao, Zhang, Wei, Liu, Yining, Hu, Zhanhao, Zhang, Bo, Hu, Xiaolin
–arXiv.org Artificial Intelligence
Deep neural networks are known to be susceptible to adversarial attacks. In this work, we focus on improving adversarial robustness in the challenging zero-shot image classification setting. To address this issue, we propose LAAT, a novel Language-driven, Anchor-based Adversarial Training strategy. LAAT utilizes a text encoder to generate fixed anchors (normalized feature embeddings) for each category and then uses these anchors for adversarial training. By leveraging the semantic consistency of the text encoders, LAAT can enhance the adversarial robustness of the image model on novel categories without additional examples. We identify the large cosine similarity problem of recent text encoders and design several effective techniques to address it. The experimental results demonstrate that LAAT significantly improves zero-shot adversarial performance, outperforming previous state-of-the-art adversarially robust one-shot methods. Moreover, our method produces substantial zero-shot adversarial robustness when models are trained on large datasets such as ImageNet-1K and applied to several downstream datasets.
arXiv.org Artificial Intelligence
Apr-10-2023
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government > Military (0.34)
- Information Technology > Security & Privacy (0.48)
- Technology: