VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Open in new window