Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models
–Neural Information Processing Systems
CLIP), have attracted widespread attention and adoption across various domains. Nonetheless, CLIP has been observed to be susceptible to adversarial examples. Through experimental analysis, we have observed a phenomenon wherein adversarial perturbations induce shifts in text-guided attention. Building upon this observation, we propose a simple yet effective strategy: Text-Guided Attention for Zero-Shot Robustness (TGA-ZSR). This framework incorporates two components: the Attention Refinement module and the Attention-based Model Constraint module.
Neural Information Processing Systems
Dec-26-2025, 23:08:09 GMT
- Technology: