Pay Less Attention to Function Words for Free Robustness of Vision-Language Models

Tian, Qiwei, Lin, Chenhao, Zhao, Zhengyu, Shen, Chao

Dec-10-2025–arXiv.org Artificial Intelligence

T o address the trade-off between robustness and performance for robust VLM, we observe that function words could incur vulnerability of VLMs against cross-modal adversarial attacks, and propose Function-word De-Attention (FDA) accordingly to mitigate the impact of function words. Similar to differential amplifiers, our FDA calculates the original and the function-word cross-attention within attention heads, and differentially subtracts the latter from the former for more aligned and robust VLMs. Comprehensive experiments include 2 SOTA baselines under 6 different attacks on 2 downstream tasks, 3 datasets, and 3 models. Overall, our FDA yields an average 18/13/53% ASR drop with only 0.2/0.3/0.6% performance drops on the 3 tested models on retrieval, and a 90% ASR drop with a 0.3% performance gain on visual grounding. W e demonstrate the scalability, generalization, and zero-shot performance of FDA experimentally, as well as in-depth ablation studies and analysis. Code will be made publicly available.

asr, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

Dec-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report (0.82)

Industry:
- Government > Regional Government > North America Government > United States Government (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found