Benchmarking Music Generation Models and Metrics via Human Preference Studies
Grötschla, Florian, Solak, Ahmet, Lanzendörfer, Luca A., Wattenhofer, Roger
–arXiv.org Artificial Intelligence
--Recent advancements have brought generated music closer to human-created compositions, yet evaluating these models remains challenging. While human preference is the gold standard for assessing quality, translating these subjective judgments into objective metrics, particularly for text-audio alignment and music quality, has proven difficult. In this work, we generate 6k songs using 12 state-of-the-art models and conduct a survey of 15k pairwise audio comparisons with 2.5k human participants to evaluate the correlation between human preferences and widely used metrics. T o the best of our knowledge, this work is the first to rank current state-of-the-art music generation models and metrics based on human preference. T o further the field of subjective metric evaluation, we provide open access to our dataset of generated music and human evaluations.
arXiv.org Artificial Intelligence
Jun-25-2025
- Genre:
- Research Report > New Finding (0.94)
- Industry:
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Technology: