Context-Aware Membership Inference Attacks against Pre-trained Large Language Models

Chang, Hongyan, Shamsabadi, Ali Shahin, Katevas, Kleomenis, Haddadi, Hamed, Shokri, Reza

Sep-10-2024–arXiv.org Machine Learning

To assess memorization and information leakage in models, Membership Inference Attacks (MIAs) aim to determine if a data point was part of a model's training set [1]. However, MIAs designed for pre-trained Large Language Models (LLMs) have been largely ineffective [2, 3]. This is primarily because these MIAs, originally developed for classification models, fail to account for the sequential nature of LLMs. Unlike classification models, which produce a single prediction, LLMs generate text token-by-token, adjusting each prediction based on the context of preceding tokens (i.e., prefix). Prior MIAs overlook token-level loss dynamics and the influence of prefixes on next-token predictability, which contributes to memorization.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

Sep-10-2024

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Information Technology > Security & Privacy (1.00)
- Law (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.45)
    - Performance Analysis > Accuracy (1.00)
  - Natural Language > Large Language Model (1.00)