Automatic Calibration for Membership Inference Attack on Large Language Models

Zade, Saleh Zare, Qiang, Yao, Zhou, Xiangyu, Zhu, Hui, Roshani, Mohammad Amin, Khanduri, Prashant, Zhu, Dongxiao

May-7-2025–arXiv.org Artificial Intelligence

Membership Inference Attacks (MIAs) have recently been employed to determine whether a specific text was part of the pre-training data of Large Language Models (LLMs). However, existing methods often misinfer non-members as members, leading to a high false positive rate, or depend on additional reference models for probability calibration, which limits their practicality. To overcome these challenges, we introduce a novel framework called Automatic Calibration Membership Inference Attack (ACMIA), which utilizes a tunable temperature to calibrate output probabilities effectively. This approach is inspired by our theoretical insights into maximum likelihood estimation during the pre-training of LLMs. We introduce ACMIA in three configurations designed to accommodate different levels of model access and increase the probability gap between members and non-members, improving the reliability and robustness of membership inference. Extensive experiments on various open-source LLMs demonstrate that our proposed attack is highly effective, robust, and generalizable, surpassing state-of-the-art baselines across three widely used benchmarks. Our code is available at: Github. 1 Introduction Large Language Models (LLMs), pre-trained on massive text corpora, have shown impressive human-level language understanding, reasoning, and decision-making capabilities [4, 28, 1, 23]. However, their tendency to memorize training data also introduces significant ethical and security concerns [14, 31, 2, 21, 22].

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

May-7-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Performance Analysis > Accuracy (1.00)
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found