AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation