Defending Against Backdoor Attacks by Layer-wise Feature Analysis

Jebreel, Najeeb Moharram, Domingo-Ferrer, Josep, Li, Yiming

Feb-24-2023–arXiv.org Artificial Intelligence

Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i.e., backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. We conduct extensive experiments on two benchmark datasets, which confirm the effectiveness of our defense.

artificial intelligence, benign sample, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Feb-24-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Spain
  - Catalonia > Tarragona Province > Tarragona (0.04)
- Asia
  - Nepal (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.67)
  - Performance Analysis > Accuracy (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found