Goto

Collaborating Authors

 backdoor attack


Setup in Detail

Neural Information Processing Systems

We implement our attack framework using Python 3.7.3 and PyTorch 1.7.13 that supports CUDA 11.0 for accelerating computations by using GPUs. We run our experiments on a machine equipped with Intel i5-8400 2.80GHz 6-core processors, 16 GB of RAM, and four Nvidia GTX 1080 Ti GPUs. To compute the Hessian trace, we use a virtual machine equipped with Intel E5-2686v4 2.30GHz 8-core processors, 64 GB of RAM, and an Nvidia Tesla V100 GPU. For all our attacks in 4.1, 4.2, 4.3, and 4.5, we use symmetric quantization for the weights and asymmetric quantization for the activation--a default configuration in many deep learning frameworks supporting quantization. Quantization granularity is layer-wise for both the weights and activation.



16d11e9595188dbad0418a85f0351aba-Supplemental.pdf

Neural Information Processing Systems

This section introduces more backgrounds on poisoning attacks and backdoor attacks, and details on the adversarial attacks that we use to craft accumulative poisoning samples in our methods. Finally, we describe the commonly used anomaly detection methods against adversarially crafted samples, following previous settings [40]. B.1 Poisoning attacks and backdoor attacks There is extensive prior work on poisoning attacks, especially in the offline settings against SVM [3], logistic regression [36], collaborative filtering [27], feature selection [54], clustering [8], and neural networks [9, 21, 22, 38, 50]. Poisoning attacks in real-time data streaming are studied on online SVM [4], autoregressive models [1, 7], bandit algorithms [20, 31, 33], and classification [26, 52, 57]. Compared to poisoning attacks, backdoor attacks draw attention in more recent researches.



the Fine tuning Process of on Poisoned

Neural Information Processing Systems

In this section, we show our empirical observations obtained from fine-tuning PLMs on poisoned494 datasets. Specifically, we demonstrate that the backdoor triggers are easier to learn from the lower495 layers than the features corresponding to the main task. This observation plays a pivotal role in496 designing and understanding our defense algorithm. In our experiment, we focus on the SST-2497 dataset [30] and consider the widely adopted word-level backdoor trigger and the more stealthy498 style-level trigger. For the word-level trigger, we follow the approach in prior work [25] and adopt the499 meaningless word "bb" as the trigger to minimize its impact on the original text's semantic meaning.500



Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

Neural Information Processing Systems

It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a re-evaluation of safety protocols in the use of open-source pre-trained models.




Malicious client Benign client Subspace distributionModel distribution

Neural Information Processing Systems

This poison-coupling the modifies poison-coupling paper the presents training effect Lockdo ef protocol in fect. FL, wn, which Lockdo by an isolating isolated significantly wn follo subspace the ws de training three grades training ke the subspaces y procedures.