Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models

Open in new window