Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
–Neural Information Processing Systems
Large-scale vision-language models such as CLIP (Radford et al., 2021) and ALIGN (Jia et al., 2021) are trained using a multimodal contrastive
Neural Information Processing Systems
Feb-8-2026, 22:15:07 GMT