PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining

Open in new window