MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens