Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Zhang, Renrui, Hu, Xiangfei, Li, Bohao, Huang, Siyuan, Deng, Hanqiu, Li, Hongsheng, Qiao, Yu, Gao, Peng

Mar-3-2023–arXiv.org Artificial Intelligence

Visual recognition in low-data regimes requires deep neural networks to learn generalized representations from limited training samples. Recently, CLIP-based methods have shown promising few-shot performance benefited from the contrastive language-image pre-training. We then question, if the more diverse pre-training knowledge can be cascaded to further assist few-shot representation learning. In this paper, we propose CaFo, a Cascade of Foundation models that incorporates diverse prior knowledge of various pre-training paradigms for better few-shot learning. Our CaFo incorporates CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, DALL-E's vision-generative knowledge, and GPT-3's language-generative knowledge. Specifically, CaFo works by 'Prompt, Generate, then Cache'. Firstly, we leverage GPT-3 to produce textual inputs for prompting CLIP with rich downstream linguistic semantics. Then, we generate synthetic images via DALL-E to expand the few-shot training data without any manpower. At last, we introduce a learnable cache model to adaptively blend the predictions from CLIP and DINO. By such collaboration, CaFo can fully unleash the potential of different pre-training methods and unify them to perform state-of-the-art for few-shot classification. Code is available at https://github.com/ZrrSkywalker/CaFo.

artificial intelligence, knowledge, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Mar-3-2023

arXiv.org PDF

Add feedback

Country:
- Africa (0.04)
- Asia
  - China
    - Guangdong Province > Shenzhen (0.04)
    - Hong Kong (0.04)
    - Shanghai > Shanghai (0.04)
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)
- Europe > Romania
  - Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found