Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

Jan-18-2025, 03:41:19 GMT–Neural Information Processing Systems

Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU).

backdoor mitigation, shared adversarial unlearning, unlearning shared adversarial example, (4 more...)

Neural Information Processing Systems

Jan-18-2025, 03:41:19 GMT

Conferences Web Page

Add feedback

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)