Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models Yuxin Wen

May-31-2025, 13:49:16 GMT–Neural Information Processing Systems

It now common to produce domain-specific models by fine-tuning large pre-trained models using a small bespoke dataset. But selecting one of the many foundation models from the web poses considerable risks, including the potential that this model has been backdoored. In this paper, we introduce a new type of model backdoor: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

May-31-2025, 13:49:16 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Maryland (0.14)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Government > Regional Government
  - North America Government > United States Government (0.67)
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks > Deep Learning (1.00)
      - Performance Analysis > Accuracy (0.95)
    - Natural Language > Large Language Model (1.00)
    - Vision (1.00)
  - Security & Privacy (1.00)