PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining Y uting Gao

Open in new window