Patch-Token Aligned Bayesian Prompt Learning for Vision-Language Models

Liu, Xinyang, Wang, Dongsheng, Li, Miaoge, Duan, Zhibin, Xu, Yishi, Chen, Bo, Zhou, Mingyuan

Mar-16-2023–arXiv.org Artificial Intelligence

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt learning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize prompt learning with the visual knowledge and view images and the corresponding prompts as patch and token sets under optimal transport, which pushes the prompt tokens to faithfully capture the label-specific visual concepts, instead of overfitting the training categories. Moreover, the proposed model can also be straightforwardly extended to the conditional case where the instance-conditional prompts are generated to improve the generalizability. Extensive experiments on 15 datasets show promising transferability and generalization performance of our proposed model.

large language model, machine learning, patch-token aligned bayesian prompt learning, (15 more...)

arXiv.org Artificial Intelligence

Mar-16-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe
  - Austria (0.04)
  - Switzerland > Zürich
    - Zürich (0.14)
  - Slovakia > Bratislava
    - Bratislava (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (0.46)
  - Natural Language > Large Language Model (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found