Goto

Collaborating Authors

 Rote Learning


Bayesian Perspective on Memorization and Reconstruction

arXiv.org Artificial Intelligence

Carlini et al. [2019] showed that it is sometimes possible to extract unique pieces of training data from modern language models (such as credit card numbers). This demonstrates that such models can unintentionally memorize rare parts of their training data, even if those parts appear only once. Since then, this memorization phenomenon has been studied in a long line of work, providing increasingly many examples in which modern models unintentionally memorize data. In fact, several follow-up papers have shown that there exist learning tasks for which memorization is provably necessary [Feldman, 2020, Feldman and Zhang, 2020, Carlini et al., 2021, Brown et al., 2021, Haim et al., 2022, Buzaglo et al., 2023, Carlini et al., 2023a,b]. However, these prior works did not converge on a single definition of memorization, and instead considered several context-dependent notions.


Dรฉjร  Vu Memorization in Visionโ€“Language Models

Neural Information Processing Systems

Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation. A natural question is whether these models memorize their training data, which also has implications for generalization. We propose a new method for measuring memorization in VLMs, which we call dรจjรก vu memorization. For VLMs trained on image-caption pairs, we show that the model indeed retains information about individual objects in the training images beyond what can be inferred from correlations or the image caption. We evaluate dรจjรก vu memorization at both sample and population level, and show that it is significant for OpenCLIP trained on as many as 50M image-caption pairs.


Can Test-time Computation Mitigate Memorization Bias in Neural Symbolic Regression?

arXiv.org Artificial Intelligence

Symbolic regression aims to discover mathematical equations that fit given numerical data. It has been applied in various fields of scientific research, such as producing human-readable expressions that explain physical phenomena. Recently, Neural symbolic regression (NSR) methods that involve Transformers pre-trained on large-scale synthetic datasets have gained attention. While these methods offer advantages such as short inference time, they suffer from low performance, particularly when the number of input variables is large. In this study, we hypothesized that this limitation stems from the memorization bias of Transformers in symbolic regression. We conducted a quantitative evaluation of this bias in Transformers using a synthetic dataset and found that Transformers rarely generate expressions not present in the training data. Additional theoretical analysis reveals that this bias arises from the Transformer's inability to construct expressions compositionally while verifying their numerical validity. We finally examined if tailoring test-time strategies can lead to reduced memorization bias and better performance. We empirically demonstrate that providing additional information to the model at test time can significantly mitigate memorization bias. On the other hand, we also find that reducing memorization bias does not necessarily correlate with improved performance. These findings contribute to a deeper understanding of the limitations of NSR approaches and offer a foundation for designing more robust, generalizable symbolic regression methods. Code is available at https://github.com/Shun-0922/Mem-Bias-NSR .


Positional Fragility in LLMs: How Offset Effects Reshape Our Understanding of Memorization Risks

arXiv.org Artificial Intelligence

Large language models are known to memorize parts of their training data, posing risk of copyright violations. To systematically examine this risk, we pretrain language models (1B/3B/8B) from scratch on 83B tokens, mixing web-scale data with public domain books used to simulate copyrighted content at controlled frequencies at lengths at least ten times longer than prior work. We thereby identified the offset effect, a phenomenon characterized by two key findings: (1) verbatim memorization is most strongly triggered by short prefixes drawn from the beginning of the context window, with memorization decreasing counterintuitively as prefix length increases; and (2) a sharp decline in verbatim recall when prefix begins offset from the initial tokens of the context window. We attribute this to positional fragility: models rely disproportionately on the earliest tokens in their context window as retrieval anchors, making them sensitive to even slight shifts. We further observe that when the model fails to retrieve memorized content, it often produces degenerated text. Leveraging these findings, we show that shifting sensitive data deeper into the context window suppresses both extractable memorization and degeneration. Our results suggest that positional offset is a critical and previously overlooked axis for evaluating memorization risks, since prior work implicitly assumed uniformity by probing only from the beginning of training sequences.


Generalizablity of Memorization Neural Network

Neural Information Processing Systems

The neural network memorization problem is to study the expressive power of neural networks to interpolate a finite dataset. Although memorization is widely believed to have a close relationship with the strong generalizability of deep learning when using overparameterized models, to the best of our knowledge, there exists no theoretical study on the generalizability of memorization neural networks. In this paper, we give the first theoretical analysis of this topic. First, algorithms are given to construct memorization networks for an i.i.d. Second, we show that, in order for the memorization networks to be generalizable, the width of the network must be at least equal to the dimension of the data, which implies that the existing memorization networks with an optimal number of parameters are not generalizable.


BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens

Neural Information Processing Systems

Detecting text generated by Large Language Models (LLMs) is a pressing need in order to identify and prevent misuse of these powerful models in a wide range of applications, which have highly undesirable consequences such as misinformation and academic dishonesty. Given a piece of subject text, many existing detection methods work by measuring the difficulty of LLM predicting the next token in the text from their prefix. In this paper, we make a critical observation that how well the current token's output logits memorizes the closely preceding input tokens also provides strong evidence. Therefore, we propose a novel bi-directional calculation method that measures the cross-entropy losses between an output logits and the ground-truth token (forward) and between the output logits and the immediately preceding input token (backward). A classifier is trained to make the final prediction based on the statistics of these losses.


Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models

Neural Information Processing Systems

Diffusion models (DMs) produce very detailed and high-quality images. Prior efforts prevent this issue by either changing the input to the diffusion process, thereby preventing the DM from generating memorized samples during inference, or removing the memorized data from training altogether. While those are viable solutions when the DM is developed and deployed in a secure and constantly monitored environment, they hold the risk of adversaries circumventing the safeguards and are not effective when the DM itself is publicly released. To solve the problem, we introduce NeMo, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers. Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples.


Localizing Memorization in SSL Vision Encoders

Neural Information Processing Systems

Recent work on studying memorization in self-supervised learning (SSL) suggests that even though SSL encoders are trained on millions of images, they still memorize individual data points. While effort has been put into characterizing the memorized data and linking encoder memorization to downstream utility, little is known about where the memorization happens inside SSL encoders. To close this gap, we propose two metrics for localizing memorization in SSL encoders on a per-layer (LayerMem) and per-unit basis (UnitMem). Our localization methods are independent of the downstream task, do not require any label information, and can be performed in a forward pass. By localizing memorization in various encoder architectures (convolutional and transformer-based) trained on diverse datasets with contrastive and non-contrastive SSL frameworks, we find that (1) while SSL memorization increases with layer depth, highly memorizing units are distributed across the entire encoder, (2) a significant fraction of units in SSL encoders experiences surprisingly high memorization of individual data points, which is in contrast to models trained under supervision, (3) atypical (or outlier) data points cause much higher layer and unit memorization than standard data points, and (4) in vision transformers, most memorization happens in the fully-connected layers.


Rethinking LLM Memorization through the Lens of Adversarial Compression

Neural Information Processing Systems

Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on \emph{how we define memorization.} In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs. A given string from the training data is considered memorized if it can be elicited by a prompt (much) shorter than the string itself---in other words, if these strings can be compressed'' with the model by computing adversarial prompts of fewer tokens.


Identifying Legal Holdings with LLMs: A Systematic Study of Performance, Scale, and Memorization

arXiv.org Artificial Intelligence

As large language models (LLMs) continue to advance in capabilities, it is essential to assess how they perform on established benchmarks. In this study, we present a suite of experiments to assess the performance of modern LLMs (ranging from 3B to 90B+ parameters) on CaseHOLD, a legal benchmark dataset for identifying case holdings. Our experiments demonstrate scaling effects - performance on this task improves with model size, with more capable models like GPT4o and AmazonNovaPro achieving macro F1 scores of 0.744 and 0.720 respectively. These scores are competitive with the best published results on this dataset, and do not require any technically sophisticated model training, fine-tuning or few-shot prompting. To ensure that these strong results are not due to memorization of judicial opinions contained in the training data, we develop and utilize a novel citation anonymization test that preserves semantic meaning while ensuring case names and citations are fictitious. Models maintain strong performance under these conditions (macro F1 of 0.728), suggesting the performance is not due to rote memorization. These findings demonstrate both the promise and current limitations of LLMs for legal tasks with important implications for the development and measurement of automated legal analytics and legal benchmarks.