Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models Ziyi Yin 1 Muchao Y e

Feb-16-2026, 08:15:25 GMT–Neural Information Processing Systems

Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Feb-16-2026, 08:15:25 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania (0.04)
  - New York > Suffolk County
    - Stony Brook (0.04)
  - Georgia > Fulton County
    - Atlanta (0.04)
- Asia > China
  - Shaanxi Province > Xi'an (0.04)
  - Liaoning Province > Dalian (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government (0.84)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language > Text Processing (0.69)
    - Machine Learning > Neural Networks (0.67)

Duplicate Docs Excel Report

Title
Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models Ziyi Yin 1 Muchao Y e

Similar Docs Excel Report more

Title	Similarity	Source
None found