Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction

Open in new window