IPO: Interpretable Prompt Optimization for Vision-Language Models 1 1 AIM Lab, University of Amsterdam 2

Mar-27-2025, 12:43:09 GMT–Neural Information Processing Systems

Pre-trained vision-language models like CLIP have remarkably adapted to various downstream tasks. Nonetheless, their performance heavily depends on the specificity of the input text prompts, which requires skillful prompt template engineering. Instead, current approaches to prompt optimization learn the prompts through gradient descent, where the prompts are treated as adjustable parameters. However, these methods tend to lead to overfitting of the base classes seen during training and produce prompts that are no longer understandable by humans. This paper introduces a simple but interpretable prompt optimizer (IPO), that utilizes large language models (LLMs) to generate textual prompts dynamically.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Mar-27-2025, 12:43:09 GMT

Conferences PDF

Add feedback

Country:
- Europe > Netherlands > North Holland > Amsterdam (0.40)

Genre:
- Research Report > Experimental Study (0.92)

Industry:
- Transportation
  - Air (0.67)
  - Passenger (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found