Goto

Collaborating Authors

 test sample


DOTA: DistributiOnal Test-time Adaptation of Vision-Language Models

Neural Information Processing Systems

However, deploying these models can be unreliable when significant distribution gaps exist between training and test data, while fine-tuning for diverse scenarios is often costly. This creates a need for methods that can efficiently adapt to new data at test time without expensive retraining. Cache-based test-time adapters serve this purpose by storing representative test samples to guide subsequent classifications. Yet, these methods typically employ naive cache management with limited capacity, leading to severe catastrophic forgetting when samples are inevitably dropped during updates. In this paper, we propose Dota(DistributiOnal Test-time Adaptation), a simple yet effective method addressing this limitation. Crucially, instead of merely memorizing individual test samples, Dotacontinuously estimates the underlying distribution of the test data stream. Test-time posterior probabilities are then computed using these dynamically estimated distributions via Bayes' theorem for adaptation. This distribution-centric approach enables the model to continually learn and adapt to the deployment environment. Extensive experiments validate that Dota significantly mitigates forgetting and achieves state-of-the-art performance compared to existing methods.


Statistics Caching Test-Time Adaptation for Vision-Language Models

Neural Information Processing Systems

Test-time adaptation (TTA) for Vision-Language Models (VLMs) aims to enhance performance on unseen test data. However, existing methods struggle to achieve robust and continuous knowledge accumulation during test time. To address this, we propose Statistics Caching test-time Adaptation (SCA), a novel cachebased approach. Unlike traditional feature-caching methods prone to forgetting, SCA continuously accumulates task-specific knowledge from all encountered test samples. By formulating the reuse of past features as a least squares problem, SCA avoids storing raw features and instead maintains compact, incrementally updated feature statistics. This design enables efficient online adaptation without the limitations of fixed-size caches, ensuring that the accumulated knowledge grows persistently over time. Furthermore, we introduce adaptive strategies that leverage the VLM's prediction uncertainty to reduce the impact of noisy pseudolabels and dynamically balance multiple prediction sources, leading to more robust and reliable performance. Extensive experiments demonstrate that SCA achieves compelling performance while maintaining competitive computational efficiency. The code is available at this link.


Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing

arXiv.org Machine Learning

This paper addresses structured out-of-distribution (OOD) testing in high-stakes machine learning applications. Traditional conformal methods rely on joint exchangeability, making it difficult to incorporate auxiliary information such as spatiotemporal or grouping structures. To overcome this limitation, we propose the structure-adaptive conformal q-value (SCQ), a significance index that integrates individual test evidence with structural patterns. We also develop pseudo-score-guided transductive automated model selection (P-TAMS), which adapts conformalized model selection to structured OOD testing across a toolbox of candidate models. Together, SCQ and P-TAMS form a unified framework under pairwise exchangeability, providing finite-sample error-rate control, improved power, and enhanced interpretability. Experiments on simulated and real data demonstrate that the proposed approach controls the false discovery rate and performs well across diverse settings.


Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

Neural Information Processing Systems

The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains - distribution shift. In this work, we explicitly handle this problem by aligning the out-of-distribution (OOD) test sample statistics to those of the source data using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Evaluating against the domain generalization benchmark, our method improves zero-shot top1 accuracy beyond existing prompt-learning techniques, with a 3.08%improvement over the baseline MaPLe. In cross-dataset generalization with unseen categories across 10 datasets, our method improves consistently across all datasets compared to the existing state-of-the-art.


584b98aac2dddf59ee2cf19ca4ccb75e-Supplemental.pdf

Neural Information Processing Systems

We used the largest batch size that could fit in memory on our limited hardware, which was 256 for an image size of 224x224. For the learning rate (Adam [2] optimizer) we searched in the range of {0.001, 0.0001, 1e04, 5e-4, 5e-5}, with weight decay {0, 5e-4. We chose a weight decay of 5e-5 and learning rate of 5e-4 until the 4:6 split and 1e-4 afterwards. We chose a prototype dimension of 256, backbone output of 512, 2 graph layers, graph hidden dimension of 512, ฮปh of 10, Clst and Sep of 0.01. UT-Zappos we again used the Adam optimizer, with learning rate in the ranges {5e-5, 5e-4, 5e-3}, and weight decay {0, 5e-4.


4c4c937b67cc8d785cea1e42ccea185c-Supplemental.pdf

Neural Information Processing Systems

Proof of Proposition 1. Due to Jensen's inequality and the fact that, by assumption, the distribution of human predictions P(h|x) is not a point-mass, it holds that Eh[`(h(x),y) |x] > `(ยตh(x),y). Proof of Theorem 3. We first provide the proof of the unconstrained case. Note that the above problem is a linear program and it decouples with respect to x. Therefore, for each x, the optimal solution is clearly given by: ฯ€ m(d= 1 |x) = 1 if Ey|x[`(m(x),y) Eh|x[`(h,y)]] >0 0 otherwise Next, we provide the proof of the constrained case. To this aim, we consider the dual formulation of the optimization problem, where we only introduce a Lagrangian multiplier ฯ„P,b for the first constraint, i.e., maximize Ex ฯ€(x) Ey,h|x[`(h,y)] Ey|x[`(m(x),y)] + Ex [ฯ„P,b(ฯ€(x) b)] (13) subject to 0 ฯ€(x) 1 x X. (14) 13 The inner minimization problem can be solved using the similar argument for the unconstrained case.





Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection

Neural Information Processing Systems

Detecting out-of-distribution (OOD) samples is vital for developing machine learning based models for critical safety systems. Common approaches for OOD detection assume access to some OOD samples during training which may not be available in a real-life scenario. Instead, we utilize the predictive normalized maximum likelihood (pNML) learner, in which no assumptions are made on the tested input. We derive an explicit expression of the pNML and its generalization error, denoted as the regret, for a single layer neural network (NN). We show that this learner generalizes well when (i) the test vector resides in a subspace spanned by the eigenvectors associated with the large eigenvalues of the empirical correlation matrix of the training data, or (ii) the test sample is far from the decision boundary. Furthermore, we describe how to efficiently apply the derived pNML regret to any pretrained deep NN, by employing the explicit pNML for the last layer, followed by the softmax function. Applying the derived regret to deep NN requires neither additional tunable parameters nor extra data. We extensively evaluate our approach on 74 OOD detection benchmarks using DenseNet-100, ResNet-34, and WideResNet40 models trained with CIFAR-100, CIFAR-10, SVHN, and ImageNet-30 showing a significant improvement of up to 15.6% over recent leading methods.