Ask, Attend, Attack: An Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

Mar-22-2026, 07:52:14 GMT–Neural Information Processing Systems

While image-to-text models have demonstrated significant advancements in various vision-language tasks, they remain susceptible to adversarial attacks. Existing white-box attacks on image-to-text models require access to the architecture, gradients, and parameters of the target model, resulting in low practicality. Although the recently proposed gray-box attacks have improved practicality, they suffer from semantic loss during the training process, which limits their targeted attack performance. To advance adversarial attacks of image-to-text models, this paper focuses on a challenging scenario: decision-based black-box targeted attacks where the attackers only have access to the final output text and aim to perform targeted attacks. Specifically, we formulate the decision-based black-box targeted attack as a large-scale optimization problem. To efficiently solve the optimization problem, a three-stage process \textit{Ask, Attend, Attack}, called \textit{AAA}, is proposed to coordinate with the solver.

artificial intelligence, machine learning, textit, (10 more...)

Neural Information Processing Systems

Mar-22-2026, 07:52:14 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.76)
  - Representation & Reasoning (0.62)