Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
–Neural Information Processing Systems
Large-scale vision-language models such as CLIP (Radford et al., 2021) and ALIGN (Jia et al., 2021) are trained using a multimodal contrastive
Neural Information Processing Systems
Oct-8-2025, 06:49:47 GMT