Multimodal Data Augmentation for Image Captioning using Diffusion Models