Scaling Speech-Text Pre-training with Synthetic Interleaved Data