Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Staab, Robin, Vero, Mark, Balunović, Mislav, Vechev, Martin

Oct-11-2023–arXiv.org Artificial Intelligence

Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals' privacy by inferring personal attributes from text given at inference time. In this work, we present the first comprehensive study on the capabilities of pretrained LLMs to infer personal attributes from text. We construct a dataset consisting of real Reddit profiles, and show that current LLMs can infer a wide range of personal attributes (e.g., location, income, sex), achieving up to $85\%$ top-1 and $95.8\%$ top-3 accuracy at a fraction of the cost ($100\times$) and time ($240\times$) required by humans. As people increasingly interact with LLM-powered chatbots across all aspects of life, we also explore the emerging threat of privacy-invasive chatbots trying to extract personal information through seemingly benign questions. Finally, we show that common mitigations, i.e., text anonymization and model alignment, are currently ineffective at protecting user privacy against LLM inference. Our findings highlight that current LLMs can infer personal data at a previously unattainable scale. In the absence of working defenses, we advocate for a broader discussion around LLM privacy implications beyond memorization, striving for a wider privacy protection.

accuracy, dataset, preprint, (15 more...)

arXiv.org Artificial Intelligence

Oct-11-2023

arXiv.org PDF

Add feedback

Country:
- South America
  - Brazil (0.04)
  - Chile > Santiago Metropolitan Region
    - Santiago Province > Santiago (0.04)
  - Argentina > Pampas
    - Buenos Aires F.D. > Buenos Aires (0.04)
- Oceania
  - Australia (0.04)
  - New Zealand > North Island
    - Auckland Region > Auckland (0.04)
- North America
  - Mexico (0.04)
  - United States
    - Alaska (0.04)
    - New York (0.04)
    - Maine (0.04)
    - Arkansas (0.04)
    - Minnesota (0.04)
    - Utah (0.04)
    - Nevada (0.04)
    - Indiana (0.04)
    - Missouri (0.04)
    - Wisconsin (0.04)
    - Arizona (0.04)
    - Oregon (0.04)
    - Connecticut (0.04)
    - Maryland (0.04)
    - Kansas (0.04)
    - Hawaii (0.04)
    - Michigan (0.04)
    - Rocky Mountains (0.04)
    - Tennessee (0.04)
    - Iowa (0.04)
    - Oklahoma (0.04)
    - Louisiana (0.04)
    - Vermont (0.04)
    - Nebraska (0.04)
    - Idaho (0.04)
    - Virginia (0.04)
    - Colorado (0.04)
    - Pennsylvania (0.04)
    - Wyoming (0.04)
    - Kentucky (0.04)
    - Mississippi (0.04)
    - Massachusetts (0.04)
    - Montana (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Ohio > Cuyahoga County
      - Cleveland (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
    - Illinois > Cook County
      - Chicago (0.04)
  - Canada
    - Rocky Mountains (0.04)
    - Quebec > Montreal (0.04)
    - Ontario
      - Toronto (0.04)
      - Wellington County > Guelph (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
    - Alberta > Census Division No. 6
      - Calgary Metropolitan Region > Calgary (0.04)
- Europe
  - France (0.04)
  - Belgium > Flanders (0.04)
  - Italy (0.04)
  - Sweden (0.04)
  - Poland (0.04)
  - Iceland (0.04)
  - Denmark (0.04)
  - Finland (0.04)
  - Norway (0.04)
  - Ireland (0.04)
  - Hungary (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - Spain > Galicia
    - Madrid (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Austria
    - Vienna (0.04)
    - Styria > Graz (0.04)
  - Germany
    - Berlin (0.04)
    - Bavaria > Upper Bavaria
      - Munich (0.04)
  - Switzerland
    - Zürich > Zürich (0.04)
    - Basel-City > Basel (0.04)
  - Greece > Attica
    - Athens (0.04)
  - United Kingdom
    - Wales (0.04)
    - Scotland (0.04)
    - England > Greater London
      - London (0.04)
- Asia
  - Singapore (0.04)
  - Russia (0.04)
  - Middle East > Republic of Türkiye (0.04)
  - Malaysia (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture
      - Tokyo (0.04)
    - Kansai > Osaka Prefecture
      - Osaka (0.04)
  - India
    - Maharashtra > Mumbai (0.04)
    - Karnataka > Bengaluru (0.04)
  - China
    - Shanghai > Shanghai (0.04)
    - Guangdong Province > Guangzhou (0.04)
    - Beijing > Beijing (0.04)
- Africa > South Africa
  - Gauteng > Johannesburg (0.04)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government
  - North America Government > United States Government (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Memory-Based Learning > Rote Learning (0.60)
    - Neural Networks > Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found