Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation

Dec-25-2025, 04:21:58 GMT–Neural Information Processing Systems

Deep neural networks (DNNs) have been shown to be vulnerable to adversarial examples, which can produce erroneous predictions by injecting imperceptible perturbations. In this work, we study the transferability of adversarial examples, which is significant due to its threat to real-world applications where model architecture or parameters are usually unknown. Many existing works reveal that the adversarial examples are likely to overfit the surrogate model that they are generated from, limiting its transfer attack performance against different target models. To mitigate the overfitting of the surrogate model, we propose a novel attack method, dubbed reverse adversarial perturbation (RAP). Specifically, instead of minimizing the loss of a single adversarial point, we advocate seeking adversarial example located at a region with unified low loss value, by injecting the worst-case perturbation (the reverse adversarial perturbation) for each step of the optimization procedure.

adversarial attack, adversarial example, transferability, (7 more...)

Neural Information Processing Systems

Dec-25-2025, 04:21:58 GMT

Conferences Web Page

Add feedback

Industry:
- Information Technology > Security & Privacy (0.45)
- Government > Military (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.59)
  - Representation & Reasoning > Optimization (0.39)