MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
Wu, Shih-Lun, Kim, Yoon, Huang, Cheng-Zhi Anna
–arXiv.org Artificial Intelligence
We present MIDI-LLM, an LLM for generating multitrack MIDI music from free-form text prompts. Our approach expands a text LLM's vocabulary to include MIDI tokens, and uses a two-stage training recipe to endow text-to-MIDI abilities. By preserving the original LLM's parameter structure, we can directly leverage the vLLM library for accelerated inference. Experiments show that MIDI-LLM achieves higher quality, better text control, and faster inference compared to the recent Text2midi model. Live demo at https://midi-llm-demo.vercel.app.
arXiv.org Artificial Intelligence
Nov-7-2025
- Country:
- North America > United States > Massachusetts (0.14)
- Genre:
- Research Report (0.40)
- Industry:
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Technology: