D ej ` a vu Memorization in Vision-Language Models
–Neural Information Processing Systems
Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation. A natural question is whether these models memorize their training data, which also has implications for generalization. We propose a new method for measuring memorization in VLMs, which we call d ej ` a vu memorization . For VLMs trained on image-caption pairs, we show that the model indeed retains information about individual objects in the training images beyond what can be inferred from correlations or the image caption. We evaluate d ej ` a vu memorization at both sample and population level, and show that it is significant for OpenCLIP trained on as many as 50M image-caption pairs. Finally, we show that text randomization considerably mitigates memorization while only moderately impacting the model's downstream task performance.
Neural Information Processing Systems
Oct-10-2025, 03:34:55 GMT
- Country:
- Africa > Central African Republic
- Ombella-M'Poko > Bimbo (0.04)
- Europe
- Poland (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- North America > United States
- California (0.04)
- South America > Chile
- Africa > Central African Republic
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: