Towards Diverse and Efficient Audio Captioning via Diffusion Models