PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining Y uting Gao

Neural Information Processing Systems 

Large-scale vision-language pre-training has achieved promising results on downstream tasks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found