Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Rashid, Md Rafi Ur, Liu, Jing, Koike-Akino, Toshiaki, Mehnaz, Shagufta, Wang, Ye

Aug-30-2024–arXiv.org Artificial Intelligence

Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pre-trained models from unverified sources, highlighting the potential risks involved.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Aug-30-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Pennsylvania (0.04)
  - Canada > Ontario
    - Toronto (0.04)

Genre:
- Research Report (0.84)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found