On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder

Han, Tingxu, Huang, Shenghan, Ding, Ziqi, Sun, Weisong, Feng, Yebo, Fang, Chunrong, Li, Jun, Qian, Hanwei, Wu, Cong, Zhang, Quanjun, Liu, Yang, Chen, Zhenyu

Mar-6-2024–arXiv.org Artificial Intelligence

In this paper, we study a defense against poisoned encoders in SSL called distillation, which is a defense used in supervised learning originally. Distillation aims to distill knowledge from a given model (a.k.a the teacher net) and transfer it to another (a.k.a the student net). Now, we use it to distill benign knowledge from poisoned pre-trained encoders and transfer it to a new encoder, resulting in a clean pre-trained encoder. In particular, we conduct an empirical study on the effectiveness and performance of distillation against poisoned encoders. Using two state-of-the-art backdoor attacks against pre-trained image encoders and four commonly used image classification datasets, our experimental results show that distillation can reduce attack success rate from 80.87% to 27.51% while suffering a 6.35% loss in accuracy. Moreover, we investigate the impact of three core components of distillation on performance: teacher net, student net, and distillation loss. By comparing 4 different teacher nets, 3 student nets, and 6 distillation losses, we find that fine-tuned teacher nets, warm-up-training-based student nets, and attention-based distillation loss perform best, respectively.

distillation, encoder, student net, (16 more...)

arXiv.org Artificial Intelligence

Mar-6-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales (0.04)
- North America
  - United States
    - Oregon (0.04)
    - Florida > Broward County
      - Fort Lauderdale (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Austria (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
- Asia
  - Singapore (0.14)
  - South Korea > Seoul
    - Seoul (0.04)
  - China
    - Jiangsu Province > Nanjing (0.05)
    - Hubei Province > Wuhan (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report > New Finding (0.87)

Industry:
- Information Technology > Security & Privacy (1.00)
- Education (1.00)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Security & Privacy (1.00)
  - Data Science (1.00)
  - Artificial Intelligence
    - Natural Language (0.92)
    - Vision (0.89)
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found