FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

Open in new window