DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

Open in new window