Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models Yuxin Wen
–Neural Information Processing Systems
It now common to produce domain-specific models by fine-tuning large pre-trained models using a small bespoke dataset. But selecting one of the many foundation models from the web poses considerable risks, including the potential that this model has been backdoored. In this paper, we introduce a new type of model backdoor: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
Neural Information Processing Systems
May-31-2025, 13:49:16 GMT
- Country:
- North America > United States > Maryland (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Technology: