Object-level Vision-Language Contrastive Pre-training

Sep-23-2022, 18:16:42 GMT–#artificialintelligence

Since 2021, with the emergence of CLIP, contrastive pre-training has been extended to supervised learning with image-text pairs. The purpose of this extension is to improve the transferability of learned visual features by aligning them to the corresponding text features, as text features are proved to be highly transferrable in GPT-3. By this extension, visual models could be adapted to downstream tasks in a zero/few-shot learning manner with good performances and data-efficiency. The original CLIP pre-training is image-level, and the learned features are mainly used for image-level downstream tasks, e.g.

pre-training, representation, vector, (14 more...)

#artificialintelligence

Sep-23-2022, 18:16:42 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Inductive Learning (0.37)
    - Neural Networks (0.36)
    - Statistical Learning (0.38)
  - Vision > Image Understanding (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found