Effective Targeted Attacks for Adversarial Self-Supervised Learning

Kim, Minseon, Ha, Hyeonjeong, Son, Sooel, Hwang, Sung Ju

Oct-26-2023–arXiv.org Artificial Intelligence

Recently, unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. Previous studies in unsupervised AT have mostly focused on implementing self-supervised learning (SSL) frameworks, which maximize the instance-wise classification loss to generate adversarial examples. However, we observe that simply maximizing the self-supervised training loss with an untargeted adversarial attack often results in generating ineffective adversaries that may not help improve the robustness of the trained model, especially for non-contrastive SSL frameworks without negative examples. To tackle this problem, we propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks. Specifically, we introduce an algorithm that selects the most confusing yet similar target example for a given instance based on entropy and similarity, and subsequently perturbs the given instance towards the selected target. Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks, on the benchmark datasets.

adversarial example, learning, robustness, (15 more...)

arXiv.org Artificial Intelligence

Oct-26-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > Jordan (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (0.87)
- Government > Military (0.70)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (1.00)
  - Neural Networks > Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found