UniCLIP: Unified Framework for Contrastive Language-Image Pre-training

Dec-23-2025, 17:32:37 GMT–Neural Information Processing Systems

Pre-training vision-language models with contrastive objectives has shown promising results that are both scalable to large uncurated datasets and transferable to many downstream applications. Some following works have targeted to improve data efficiency by adding self-supervision terms, but inter-domain (image-text) contrastive loss and intra-domain (image-image) contrastive loss are defined on individual spaces in those works, so many feasible combinations of supervision are overlooked. To overcome this issue, we propose UniCLIP, a Unified framework for Contrastive Language-Image Pre-training.

contrastive language-image pre-training, uniclip, unified framework, (5 more...)

Neural Information Processing Systems

Dec-23-2025, 17:32:37 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.87)