VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation