Goto

Collaborating Authors

 clean sample




ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Neural Information Processing Systems

In this work, we propose an innovative test-time poisoned sample detection framework that hinges on the in-terpretability of model predictions, grounded in the semantic meaning of inputs. We contend that triggers (e.g., infrequent words) are not supposed to fundamentally alter the underlying semantic meanings of poisoned samples as they want to






Supplementary Material of " BackdoorBench: A Comprehensive Benchmark of Backdoor Learning "

Neural Information Processing Systems

A.1 Descriptions of backdoor attack algorithms In addition to the basic information in Table 1 of the main manuscript, here we describe the general idea of eight implemented backdoor attack algorithms in BackdoorBench, as follows. A.2 Descriptions of backdoor defense algorithms In addition to the basic information in Table 2 of the main manuscript, here we describe the general idea of nine implemented backdoor defense algorithms in BackdoorBench, as follows. It is used to determine the number of pruned neurons. Running environments Our evaluations are conducted on GPU servers with 2 Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz, RTX3090 GPU (32GB) and 320 GB RAM (2666MHz). With these hyper-3 Table 2: Hyper-parameter settings of all implemented defense methods.