Detection Framework for Inference Stage Backdoor Defenses

Apr-25-2026, 09:56:56 GMT–Neural Information Processing Systems

Backdoor attacks involve inserting poisoned samples during training, resulting in a model containing a hidden backdoor that can trigger specific behaviors without impacting performance on normal samples. These attacks are challenging to detect, as the backdoored model appears normal until activated by the backdoor trigger, rendering them particularly stealthy. In this study, we devise a unified inferencestage detection framework to defend against backdoor attacks. We first rigorously formulate the inference-stage backdoor detection problem, encompassing various existing methods, and discuss several challenges and limitations. We then propose a framework with provable guarantees on the false positive rate or the probability of misclassifying a clean sample. Further, we derive the most powerful detection rule to maximize the detection power, namely the rate of accurately identifying a backdoor sample, given a false positive rate under classical learning scenarios.

artificial intelligence, machine learning, upper boundcbd-scm0, (16 more...)

Neural Information Processing Systems

Apr-25-2026, 09:56:56 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Duplicate Docs Excel Report

Title
1868a3c73d0d2a44c42458575fa8514c-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found