AITopics

2510.16022

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Tonekaboni, Sana, Stempfle, Lena, Fallahpour, Adibvafa, Gerych, Walter, Ghassemi, Marzyeh

An Investigation of Memorization Risk in Healthcare Foundation Models

arXiv.org Artificial IntelligenceOct-16-2025

Foundation models trained on large-scale de-identified electronic health records (EHRs) hold promise for clinical applications. However, their capacity to memorize patient information raises important privacy concerns. In this work, we introduce a suite of black-box evaluation tests to assess privacy-related memorization risks in foundation models trained on structured EHR data. Our framework includes methods for probing memorization at both the embedding and generative levels, and aims to distinguish between model generalization and harmful memorization in clinically relevant settings. We contextualize memorization in terms of its potential to compromise patient privacy, particularly for vulnerable subgroups. We validate our approach on a publicly available EHR foundation model and release an open-source toolkit to facilitate reproducible and collaborative privacy assessments in healthcare AI.

information, large language model, machine learning, (19 more...)

2510.1295

Country: North America > Canada > Ontario (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Kowalczuk, Antoni, Hintersdorf, Dominik, Struppek, Lukas, Kersting, Kristian, Dziedzic, Adam, Boenisch, Franziska

Finding Dori: Memorization in Text-to-Image Diffusion Models Is Not Local

arXiv.org Artificial IntelligenceOct-15-2025

Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering verbatim training data replication, based on the assumption that memorization can be localized. We challenge this assumption and demonstrate that, even after such pruning, small perturbations to the text embeddings of previously mitigated prompts can re-trigger data replication, revealing the fragility of such defenses. Our further analysis then provides multiple indications that memorization is indeed not inherently local: (1) replication triggers for memorized images are distributed throughout text embedding space; (2) embeddings yielding the same replicated image produce divergent model activations; and (3) different pruning methods identify inconsistent sets of memorization-related weights for the same image. Finally, we show that bypassing the locality assumption enables more robust mitigation through adversarial fine-tuning. These findings provide new insights into the nature of memorization in text-to-image DMs and inform the development of more reliable mitigations against DM memorization.

artificial intelligence, machine learning, memorization, (17 more...)

2507.1688

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Udsa, Tinnakit, Udomcharoenchaikit, Can, Payoungkhamdee, Patomporn, Nutanong, Sarana, Rattanavipanon, Norrathep

Exploring Cross-Client Memorization of Training Data in Large Language Models for Federated Learning

arXiv.org Artificial IntelligenceOct-13-2025

Federated learning (FL) enables collaborative training without raw data sharing, but still risks training data memorization. Existing FL memorization detection techniques focus on one sample at a time, underestimating more subtle risks of cross-sample memorization. In contrast, recent work on centralized learning (CL) has introduced fine-grained methods to assess memorization across all samples in training data, but these assume centralized access to data and cannot be applied directly to FL. We bridge this gap by proposing a framework that quantifies both intra- and inter-client memorization in FL using fine-grained cross-sample memorization measurement across all clients. Based on this framework, we conduct two studies: (1) measuring subtle memorization across clients and (2) examining key factors that influence memorization, including decoding strategies, prefix length, and FL algorithms. Our findings reveal that FL models do memorize client data, particularly intra-client data, more than inter-client data, with memorization influenced by training and inferencing factors.

artificial intelligence, machine learning, memorization, (16 more...)

2510.0875

Country:

North America > United States (0.28)
Asia > Thailand (0.28)
Asia > Japan (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Consumer Health (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Neural Information Processing SystemsOct-11-2025, 00:24:10 GMT

Memorization with Compression

arxiv preprint arxiv, memorization, training data, (13 more...)

Country:

Europe (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
North America > United States > California (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Law (1.00)
Government > Regional Government (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
(3 more...)

Neural Information Processing SystemsOct-10-2025, 11:46:46 GMT

Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models

Diffusion models (DMs) produce very detailed and high-quality images.

memorization, memorization neuron, neuron, (14 more...)

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.57)

Neural Information Processing SystemsOct-10-2025, 03:34:55 GMT

D ej ` a vu Memorization in Vision-Language Models

Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation. A natural question is whether these models memorize their training data, which also has implications for generalization. We propose a new method for measuring memorization in VLMs, which we call d ej ` a vu memorization . For VLMs trained on image-caption pairs, we show that the model indeed retains information about individual objects in the training images beyond what can be inferred from correlations or the image caption. We evaluate d ej ` a vu memorization at both sample and population level, and show that it is significant for OpenCLIP trained on as many as 50M image-caption pairs. Finally, we show that text randomization considerably mitigates memorization while only moderately impacting the model's downstream task performance.

information, memorization, vu memorization, (15 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

arXiv.org Artificial IntelligenceOct-10-2025

TTOM: Test-Time Optimization and Memorization for Compositional Video Generation

Qu, Leigang, Wang, Ziyang, Zheng, Na, Wang, Wenjie, Nie, Liqiang, Chua, Tat-Seng

Video Foundation Models (VFMs) exhibit remarkable visual generation performance, but struggle in compositional scenarios (e.g., motion, numeracy, and spatial relation). In this work, we introduce Test-Time Optimization and Memorization (TTOM), a training-free framework that aligns VFM outputs with spatiotemporal layouts during inference for better text-image alignment. Rather than direct intervention to latents or attention per-sample in existing work, we integrate and optimize new parameters guided by a general layout-attention objective. Furthermore, we formulate video generation within a streaming setting, and maintain historical optimization contexts with a parametric memory mechanism that supports flexible operations, such as insert, read, update, and delete. Notably, we found that TTOM disentangles compositional world knowledge, showing powerful transferability and generalization. Experimental results on the T2V-CompBench and Vbench benchmarks establish TTOM as an effective, practical, scalable, and efficient framework to achieve cross-modal alignment for compositional video generation on the fly.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

2510.0794

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Tao, Jiashu, Shokri, Reza

(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs

arXiv.org Artificial IntelligenceOct-10-2025

Machine learning models are known to leak sensitive information, as they inevitably memorize (parts of) their training data. More alarmingly, large language models (LLMs) are now trained on nearly all available data, which amplifies the magnitude of information leakage and raises serious privacy risks. Hence, it is more crucial than ever to quantify privacy risk before the release of LLMs. The standard method to quantify privacy is via membership inference attacks, where the state-of-the-art approach is the Robust Membership Inference Attack (RMIA). In this paper, we present InfoRMIA, a principled information-theoretic formulation of membership inference. Our method consistently outperforms RMIA across benchmarks while also offering improved computational efficiency. In the second part of the paper, we identify the limitations of treating sequence-level membership inference as the gold standard for measuring leakage. We propose a new perspective for studying membership and memorization in LLMs: token-level signals and analyses. We show that a simple token-based InfoRMIA can pinpoint which tokens are memorized within generated outputs, thereby localizing leakage from the sequence level down to individual tokens, while achieving stronger sequence-level inference power on LLMs. This new scope rethinks privacy in LLMs and can lead to more targeted mitigation, such as exact unlearning.

large language model, machine learning, natural language, (17 more...)

2510.05582

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.62)

Neural Information Processing SystemsOct-9-2025, 07:32:28 GMT

Post-processing Private Synthetic Data for Improving Utility on Selected Measures Hao Wang, Shivchander Sudalairaj, John Henning, Kristjan Greenewald, Akash Srivastava MIT-IBM Watson AI Lab

The advancement of machine learning (ML) techniques relies on large amounts of training data. However, data collection also poses a significant risk of exposing private information.

mechanism, real data, synthetic data, (13 more...)

Country:

North America > United States (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Case Based Reasoning (0.40)