Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Oct-11-2024, 05:27:10 GMT–Neural Information Processing Systems

Pre-trained vision-language models (e.g., CLIP) have shown promising zero-shot generalization in many downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using the training data from downstream tasks. While effective, training on domain-specific data reduces a model's generalization capability to unseen new domains. In this work, we propose test-time prompt tuning (TPT), a method that can learn adaptive prompts on the fly with a single test sample. TPT optimizes the prompt by minimizing the entropy with confidence selection so that the model has consistent predictions across different augmented views of each test sample.

test-time prompt tuning, vision-language model, zero-shot generalization, (3 more...)

Neural Information Processing Systems

Oct-11-2024, 05:27:10 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)