Towards Diverse and Efficient Audio Captioning via Diffusion Models

Open in new window