User Inference Attacks on Large Language Models
Kandpal, Nikhil, Pillutla, Krishna, Oprea, Alina, Kairouz, Peter, Choquette-Choo, Christopher A., Xu, Zheng
–arXiv.org Artificial Intelligence
Successfully applying large language models (LLMs) to real-world problems is often best achieved by fine-tuning on domain-specific data (Liu et al., 2022; Mosbach et al., 2023). This approach is seen in a variety of commercial products deployed today, e.g., GitHub Copilot (Chen et al., 2021), Gmail Smart Compose (Chen et al., 2019), GBoard (Xu et al., 2023), etc., that are based on LMs trained or fine-tuned on domain-specific data collected from users. The practice of fine-tuning on user data--particularly on sensitive data like emails, texts, or source code--comes with privacy concerns, as LMs have been shown to leak information from their training data (Carlini et al., 2021), especially as models are scaled larger (Carlini et al., 2023). In this paper, we study the privacy risks posed to users whose data are leveraged to fine-tune LLMs. Most existing privacy attacks on LLMs can be grouped into two categories: membership inference, in which the attacker obtains access to a sample and must determine if it was trained on (Mireshghallah et al., 2022; Mattern et al., 2023; Niu et al., 2023); and extraction attacks, in which the attacker tries to reconstruct the training data by prompting the model with different prefixes (Carlini et al., 2021; Lukas et al., 2023). These threat models make no assumptions about the training data and thus cannot estimate the privacy risk to a user when that user contributes many, likely correlated, training samples. To this end we introduce the novel threat model of user inference, a relevant and realistic privacy attack vector for LLMs fine-tuned on user data, depicted in Figure 1.
arXiv.org Artificial Intelligence
Oct-13-2023
- Country:
- North America > Canada > Ontario > Toronto (0.14)
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: