Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction