IPO: Interpretable Prompt Optimization for Vision-Language Models 1 1 AIM Lab, University of Amsterdam 2
–Neural Information Processing Systems
Pre-trained vision-language models like CLIP have remarkably adapted to various downstream tasks. Nonetheless, their performance heavily depends on the specificity of the input text prompts, which requires skillful prompt template engineering. Instead, current approaches to prompt optimization learn the prompts through gradient descent, where the prompts are treated as adjustable parameters. However, these methods tend to lead to overfitting of the base classes seen during training and produce prompts that are no longer understandable by humans. This paper introduces a simple but interpretable prompt optimizer (IPO), that utilizes large language models (LLMs) to generate textual prompts dynamically.
Neural Information Processing Systems
Mar-27-2025, 12:43:09 GMT
- Country:
- Europe > Netherlands > North Holland > Amsterdam (0.40)
- Genre:
- Research Report > Experimental Study (0.92)
- Industry:
- Technology: