Statistics Caching Test-Time Adaptation for Vision-Language Models
–Neural Information Processing Systems
Test-time adaptation (TTA) for Vision-Language Models (VLMs) aims to enhance performance on unseen test data. However, existing methods struggle to achieve robust and continuous knowledge accumulation during test time. To address this, we propose Statistics Caching test-time Adaptation (SCA), a novel cachebased approach. Unlike traditional feature-caching methods prone to forgetting, SCA continuously accumulates task-specific knowledge from all encountered test samples. By formulating the reuse of past features as a least squares problem, SCA avoids storing raw features and instead maintains compact, incrementally updated feature statistics. This design enables efficient online adaptation without the limitations of fixed-size caches, ensuring that the accumulated knowledge grows persistently over time. Furthermore, we introduce adaptive strategies that leverage the VLM's prediction uncertainty to reduce the impact of noisy pseudolabels and dynamically balance multiple prediction sources, leading to more robust and reliable performance. Extensive experiments demonstrate that SCA achieves compelling performance while maintaining competitive computational efficiency. The code is available at this link.
Neural Information Processing Systems
Jun-14-2026, 13:56:33 GMT
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Information Technology > Security & Privacy (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language (1.00)
- Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence