Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation
Wang, Tongxi, Yu, Yang, Wang, Qing, Qian, Junlang
–arXiv.org Artificial Intelligence
Song generation is regarded as the most challenging problem in music AIGC; nonetheless, existing approaches have yet to fully overcome four persistent limitations: controllability, generalizability, perceptual quality, and duration. We argue that these shortcomings stem primarily from the prevailing paradigm of attempting to learn music theory directly from raw audio, a task that remains prohibitively difficult for current models. To address this, we present Bar-level AI Composing Helper (BACH), the first model explicitly designed for song generation through human-editable symbolic scores. BACH introduces a tokenization strategy and a symbolic generative procedure tailored to hierarchical song structure. Consequently, it achieves substantial gains in the efficiency, duration, and perceptual quality of song generation. Experiments demonstrate that BACH, with a small model size, establishes a new SOTA among all publicly reported song generation systems, even surpassing commercial solutions such as Suno. Human evaluations further confirm its superiority across multiple subjective metrics.
arXiv.org Artificial Intelligence
Aug-5-2025
- Country:
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Music (1.00)
- Technology: