Multimodal Data Augmentation for Image Captioning using Diffusion Models

Open in new window