Scaling Speech-Text Pre-training with Synthetic Interleaved Data

Open in new window