Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

Miranda, Michele, Ruzzetti, Elena Sofia, Santilli, Andrea, Zanzotto, Fabio Massimo, Bratières, Sébastien, Rodolà, Emanuele

Aug-10-2024–arXiv.org Artificial Intelligence

Large Language Models (LLMs) represent a significant advancement in artificial intelligence, finding applications across various domains. However, their reliance on massive internet-sourced datasets for training brings notable privacy issues, which are exacerbated in critical domains (e.g., healthcare). Moreover, certain application-specific scenarios may require fine-tuning these models on private data. This survey critically examines the privacy threats associated with LLMs, emphasizing the potential for these models to memorize and inadvertently reveal sensitive information. We explore current threats by reviewing privacy attacks on LLMs and propose comprehensive solutions for integrating privacy mechanisms throughout the entire learning pipeline. These solutions range from anonymizing training datasets to implementing differential privacy during training or inference and machine unlearning after training. Our comprehensive review of existing literature highlights ongoing challenges, available tools, and future directions for preserving privacy in LLMs. This work aims to guide the development of more secure and trustworthy AI systems by providing a thorough understanding of privacy preservation methods and their effectiveness in mitigating risks.

neural information processing system, pre-training dataset, security and privacy, (15 more...)

arXiv.org Artificial Intelligence

Aug-10-2024

arXiv.org PDF

Add feedback

Country:
- South America
  - Brazil (0.04)
  - Colombia > Meta Department
    - Villavicencio (0.04)
  - Chile > Santiago Metropolitan Region
    - Santiago Province > Santiago (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Illinois > Cook County
      - Chicago (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Switzerland (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Middle East > Malta
    - Port Region > Southern Harbour District > Floriana (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - China > Hong Kong (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
- Africa > South Sudan
  - Equatoria > Central Equatoria > Juba (0.04)

Genre:
- Research Report > New Finding (1.00)
- Overview (1.00)
- Workflow (0.92)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)