AITopics | purified model

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Neural Information Processing SystemsMar-21-2026, 13:50:05 GMT

Backdoor attacks pose a significant threat to Deep Neural Networks (DNNs) as they allow attackers to manipulate model predictions with backdoor triggers. To address these security vulnerabilities, various backdoor purification methods have been proposed to purify compromised models.

artificial intelligence, machine learning, proceedings, (11 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.59)

Technology:

Information Technology > Security & Privacy (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Neural Information Processing SystemsFeb-16-2026, 13:52:43 GMT

However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods.

data mining, machine learning, purified model, (20 more...)

Neural Information Processing Systems

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > United States > Pennsylvania (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

Neural Information Processing SystemsDec-25-2025, 06:10:23 GMT

Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU). Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs such that they are either correctly classified by the purified model and/or differently classified by the two models, such that the backdoor effect in the backdoored model will be mitigated in the purified model. Experiments on various benchmark datasets and network architectures show that our proposed method achieves state-of-the-art performance for backdoor defense.

backdoor mitigation, shared adversarial unlearning, unlearning shared adversarial example, (6 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.97)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

8e8399e5e7aed601c9f135f40be26564-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:20:12 GMT

poisoning rate, purified model, robustness, (15 more...)

Neural Information Processing Systems

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > United States > Pennsylvania (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Neural Information Processing SystemsMay-27-2025, 08:31:37 GMT

Backdoor attacks pose a significant threat to Deep Neural Networks (DNNs) as they allow attackers to manipulate model predictions with backdoor triggers. To address these security vulnerabilities, various backdoor purification methods have been proposed to purify compromised models. However, \textit{Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase?} In this paper, we provide an affirmative answer to this question by thoroughly investigating the \textit{Post-Purification Robustness} of current backdoor purification methods. We find that current safety purification methods are vulnerable to the rapid re-learning of backdoor behavior, even when further fine-tuning of purified models is performed using a very small number of poisoned samples.

post-purification robustness, purification method, superficial safety, (9 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.61)

Technology:

Information Technology > Security & Privacy (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

Neural Information Processing SystemsJan-18-2025, 03:41:19 GMT

Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU).

backdoor mitigation, shared adversarial unlearning, unlearning shared adversarial example, (4 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Min, Rui, Qin, Zeyu, Zhang, Nevin L., Shen, Li, Cheng, Minhao

arXiv.org Artificial IntelligenceOct-16-2024

Backdoor attacks pose a significant threat to Deep Neural Networks (DNNs) as they allow attackers to manipulate model predictions with backdoor triggers. To address these security vulnerabilities, various backdoor purification methods have been proposed to purify compromised models. Typically, these purified models exhibit low Attack Success Rates (ASR), rendering them resistant to backdoored inputs. However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods. We find that current safety purification methods are vulnerable to the rapid re-learning of backdoor behavior, even when further fine-tuning of purified models is performed using a very small number of poisoned samples. Based on this, we further propose the practical Query-based Reactivation Attack (QRA) which could effectively reactivate the backdoor by merely querying purified models. We find the failure to achieve satisfactory post-purification robustness stems from the insufficient deviation of purified models from the backdoored model along the backdoor-connected path. To improve the post-purification robustness, we propose a straightforward tuning defense, Path-Aware Minimization (PAM), which promotes deviation along backdoor-connected paths with extra model updates. Extensive experiments demonstrate that PAM significantly improves post-purification robustness while maintaining a good clean accuracy and low ASR. Our work provides a new perspective on understanding the effectiveness of backdoor safety tuning and highlights the importance of faithfully assessing the model's safety.

poisoning rate, purified model, robustness, (16 more...)

arXiv.org Artificial Intelligence

2410.09838

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > United States > Pennsylvania (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

Filters

Collaborating Authors

purified model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

8e8399e5e7aed601c9f135f40be26564-Paper-Conference.pdf

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense