clean sample
ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP
In this work, we propose an innovative test-time poisoned sample detection framework that hinges on the in-terpretability of model predictions, grounded in the semantic meaning of inputs. We contend that triggers (e.g., infrequent words) are not supposed to fundamentally alter the underlying semantic meanings of poisoned samples as they want to
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.05)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.05)
- Asia > Nepal (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Asia > Nepal (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
Supplementary Material of " BackdoorBench: A Comprehensive Benchmark of Backdoor Learning "
A.1 Descriptions of backdoor attack algorithms In addition to the basic information in Table 1 of the main manuscript, here we describe the general idea of eight implemented backdoor attack algorithms in BackdoorBench, as follows. A.2 Descriptions of backdoor defense algorithms In addition to the basic information in Table 2 of the main manuscript, here we describe the general idea of nine implemented backdoor defense algorithms in BackdoorBench, as follows. It is used to determine the number of pruned neurons. Running environments Our evaluations are conducted on GPU servers with 2 Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz, RTX3090 GPU (32GB) and 320 GB RAM (2666MHz). With these hyper-3 Table 2: Hyper-parameter settings of all implemented defense methods.