AITopics | Memory-Based Learning

2510.03827

Country:

North America > United States > Massachusetts (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.61)

arXiv.org Artificial IntelligenceOct-7-2025

Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Li, Ziyue, Fan, Chenrui, Zhou, Tianyi

This paper presents the first study of grokking in practical LLM pretraining. Specifically, we investigate when an LLM memorizes the training data, when its generalization on downstream tasks starts to improve, and what happens if there is a lag between the two. Unlike existing works studying when a small model generalizes to limited and specified tasks during thousands epochs' training on algorithmic data, we focus on a practical setting for LLMs, i.e., one-epoch pretraining of next-token prediction on a cross-domain, large-scale corpus, and generalization on diverse benchmark tasks covering math/commonsense reasoning, code generation, and domain-specific retrieval. Our study, for the first time, verifies that grokking still emerges in pretraining mixture-of-experts (MoE) LLMs, though different local data groups may enter their grokking stages asynchronously due to the heterogeneity of their distributions and attributions to others. To find a mechanistic interpretation of this local grokking, we investigate the dynamics of training data's pathways (i.e., expert choices across layers in MoE). Our primary discovery is that the pathways evolve from random, non-smooth across layers, instance-specific to more structured and transferable across samples, despite the converged pretraining loss. This depicts a transition from memorization to generalization. Two novel metrics are developed to quantify these patterns: one computes the pathway similarity between samples, while the other measures the consistency of aggregated experts between subsequent layers for each sample. These training data based metrics induce zero cost but can faithfully track and monitor the generalization of LLMs on downstream tasks, which, in conventional settings, requires costly instruction tuning and benchmark evaluation.

large language model, machine learning, natural language, (20 more...)

2506.21551

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.65)

arXiv.org Artificial IntelligenceOct-7-2025

LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation

Kim, Chaeeun, Lee, Jinu, Hwang, Wonseok

Legal Case Retrieval (LCR), which retrieves relevant cases from a query case, is a fundamental task for legal professionals in research and decision-making. However, existing studies on LCR face two major limitations. First, they are evaluated on relatively small-scale retrieval corpora (e.g., 100-55K cases) and use a narrow range of criminal query types, which cannot sufficiently reflect the complexity of real-world legal retrieval scenarios. Second, their reliance on embedding-based or lexical matching methods often results in limited representations and legally irrelevant matches. To address these issues, we present: (1) LEGAR BENCH, the first large-scale Korean LCR benchmark, covering 411 diverse crime types in queries over 1.2M candidate cases; and (2) LegalSearchLM, a retrieval model that performs legal element reasoning over the query case and directly generates content containing those elements, grounded in the target cases through constrained decoding. Experimental results show that LegalSearchLM outperforms baselines by 6-20% on LEGAR BENCH, achieving state-of-the-art performance. It also demonstrates strong generalization to out-of-domain cases, outperforming naive generative models trained on in-domain data by 15%.

artificial intelligence, machine learning, natural language, (14 more...)

2505.23832

Country:

Asia (0.93)
North America > United States (0.46)

Genre: Research Report (0.84)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.85)

Neural Information Processing SystemsOct-3-2025, 02:52:28 GMT

Neural Networks Learning and Memorization with (almost) no Over-Parameterization

Many results in recent years established polynomial time learnability of various models via neural networks algorithms (e.g.

artificial intelligence, machine learning, neural network, (14 more...)

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)

arXiv.org Artificial IntelligenceOct-3-2025

Reason to Rote: Rethinking Memorization in Reasoning

Du, Yupei, Mondorf, Philipp, Casola, Silvia, Yao, Yuekun, Litschko, Robert, Plank, Barbara

Large language models readily memorize arbitrary training instances, such as label noise, yet they perform strikingly well on reasoning tasks. In this work, we investigate how language models memorize label noise, and why such memorization in many cases does not heavily affect generalizable reasoning capabilities. Using two controllable synthetic reasoning datasets with noisy labels, four-digit addition (FDA) and two-hop relational reasoning (THR), we discover a reliance of memorization on generalizable reasoning mechanisms: models continue to compute intermediate reasoning outputs even when retrieving memorized noisy labels, and intervening reasoning adversely affects memorization. We further show that memorization operates through distributed encoding, i.e., aggregating various inputs and intermediate results, rather than building a look-up mechanism from inputs to noisy labels. Moreover, our FDA case study reveals memorization occurs via outlier heuristics, where existing neuron activation patterns are slightly shifted to fit noisy labels. Together, our findings suggest that memorization of label noise in language models builds on, rather than overrides, the underlying reasoning mechanisms, shedding lights on the intriguing phenomenon of benign memorization.

large language model, machine learning, natural language, (19 more...)

2507.04782

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Public Health (0.72)
Health & Medicine > Government Relations & Public Policy (0.72)
Government > Regional Government > North America Government > United States Government > FDA (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Neural Information Processing SystemsOct-2-2025, 20:41:11 GMT

Export Reviews, Discussions, Author Feedback and Meta-Reviews

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes the Latent Case Model (LCM), a Bayesian approach to clustering in which clusters are represented by a prototype (a specific sample from the data) and feature subspaces (a binary subset of the variables signifying those features that are relevant to the class). The approach is presented as being a Bayesian, trainable version of the Case-Based Reasoning approach popular in AI, and is motivated by the ways such models have proved highly effective in explaining human decision making. The generative model (Figure 1) represents each item as coming from a mixture of S clusters, where each cluster is represented by a prototype and subspace (as above) and a function \phi which generates features matching those of the prototype with high probability for features in the subspace, and uniform features outside it. The model is thus similar in functionality to LDA but quite different in terms of its representation.

human interpretability, interpretability, lda, (14 more...)

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
(2 more...)

Neural Information Processing SystemsOct-2-2025, 20:41:10 GMT

The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification

Been Kim, Cynthia Rudin, Julie A. Shah

Neural Information Processing Systems http://nips.cc/

bayesian case model, case-based reasoning and prototype classification, generative approach

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.85)

Gaul, Gabriel de Olim, Gould, Adam, Kori, Avinash, Toni, Francesca

Object-Centric Case-Based Reasoning via Argumentation

arXiv.org Artificial IntelligenceOct-2-2025

We introduce Slot Attention Argumentation for Case-Based Reasoning (SAA-CBR), a novel neuro-symbolic pipeline for image classification that integrates object-centric learning via a neural Slot Attention (SA) component with symbolic reasoning conducted by Abstract Argumentation for Case-Based Reasoning (AA-CBR). We explore novel integrations of AA-CBR with the neural component, including feature combination strategies, casebase reduction via representative samples, novel count-based partial orders, a One-Vs-Rest strategy for extending AA-CBR to multi-class classification, and an application of Supported AA-CBR, a bipolar variant of AA-CBR. We demonstrate that SAA-CBR is an effective classifier on the CLEVR-Hans datasets, showing competitive performance against baseline models.

artificial intelligence, machine learning, natural language, (15 more...)

2510.00185

Country: Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsSep-30-2025, 09:03:57 GMT

The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification

We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the ``quintessential observations that best represent clusters in a dataset, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art.

bayesian case model, case-based reasoning and prototype classification, generative approach, (4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.70)

Do, Tue, Chandrasekaran, Varun, Alabi, Daniel

Efficiently Attacking Memorization Scores

arXiv.org Artificial IntelligenceSep-30-2025

Influence estimation tools -- such as memorization scores -- are widely used to understand model behavior, attribute training data, and inform dataset curation. However, recent applications in data valuation and responsible machine learning raise the question: can these scores themselves be adversarially manipulated? In this work, we present a systematic study of the feasibility of attacking memorization-based influence estimators. We characterize attacks for producing highly memorized samples as highly sensitive queries in the regime where a trained algorithm is accurate. Our attack (calculating the pseudoinverse of the input) is practical, requiring only black-box access to model outputs and incur modest computational overhead. We empirically validate our attack across a wide suite of image classification tasks, showing that even state-of-the-art proxies are vulnerable to targeted score manipulations. In addition, we provide a theoretical analysis of the stability of memorization scores under adversarial perturbations, revealing conditions under which influence estimates are inherently fragile. Our findings highlight critical vulnerabilities in influence-based attribution and suggest the need for robust defenses. All code can be found at https://github.com/tuedo2/MemAttack

artificial intelligence, deep learning, machine learning, (17 more...)

2509.20463

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)