Goto

Collaborating Authors

 camia


Context-Aware Membership Inference Attacks against Pre-trained Large Language Models

arXiv.org Machine Learning

To assess memorization and information leakage in models, Membership Inference Attacks (MIAs) aim to determine if a data point was part of a model's training set [1]. However, MIAs designed for pre-trained Large Language Models (LLMs) have been largely ineffective [2, 3]. This is primarily because these MIAs, originally developed for classification models, fail to account for the sequential nature of LLMs. Unlike classification models, which produce a single prediction, LLMs generate text token-by-token, adjusting each prediction based on the context of preceding tokens (i.e., prefix). Prior MIAs overlook token-level loss dynamics and the influence of prefixes on next-token predictability, which contributes to memorization.