Foundation Models for Music: A Survey
Ma, Yinghao, Øland, Anders, Ragni, Anton, Del Sette, Bleiz MacSen, Saitis, Charalampos, Donahue, Chris, Lin, Chenghua, Plachouras, Christos, Benetos, Emmanouil, Shatri, Elona, Morreale, Fabio, Zhang, Ge, Fazekas, György, Xia, Gus, Zhang, Huan, Manco, Ilaria, Huang, Jiawen, Guinot, Julien, Lin, Liwei, Marinelli, Luca, Lam, Max W. Y., Sharma, Megha, Kong, Qiuqiang, Dannenberg, Roger B., Yuan, Ruibin, Wu, Shangda, Wu, Shih-Lun, Dai, Shuqi, Lei, Shun, Kang, Shiyin, Dixon, Simon, Chen, Wenhu, Huang, Wenhao, Du, Xingjian, Qu, Xingwei, Tan, Xu, Li, Yizhi, Tian, Zeyue, Wu, Zhiyong, Wu, Zhizheng, Ma, Ziyang, Wang, Ziyu
–arXiv.org Artificial Intelligence
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the significance of music in various industries and trace the evolution of AI in music. By delineating the modalities targeted by foundation models, we discover many of the music representations are underexplored in FM development. Then, emphasis is placed on the lack of versatility of previous methods on diverse music applications, along with the potential of FMs in music understanding, generation and medical application. By comprehensively exploring the details of the model pre-training paradigm, architectural choices, tokenisation, finetuning methodologies and controllability, we emphasise the important topics that should have been well explored, like instruction tuning and in-context learning, scaling law and emergent ability, as well as long-sequence modelling etc. A dedicated section presents insights into music agents, accompanied by a thorough analysis of datasets and evaluations essential for pre-training and downstream tasks. Finally, by underscoring the vital importance of ethical considerations, we advocate that following research on FM for music should focus more on such issues as interpretability, transparency, human responsibility, and copyright issues. The paper offers insights into future challenges and trends on FMs for music, aiming to shape the trajectory of human-AI collaboration in the music realm.
arXiv.org Artificial Intelligence
Sep-3-2024
- Country:
- Africa > Rwanda
- Asia
- Japan > Honshū
- Kantō
- Kanagawa Prefecture > Yokohama (0.04)
- Tokyo Metropolis Prefecture > Tokyo (0.04)
- Kantō
- Indonesia > Bali (0.04)
- Malaysia > Kuala Lumpur
- Kuala Lumpur (0.04)
- India > Karnataka
- Bengaluru (0.04)
- Middle East
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- Jordan (0.04)
- Israel > Tel Aviv District
- China
- South Korea
- Singapore (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Japan > Honshū
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Portugal > Braga
- Braga (0.04)
- Greece (0.04)
- Italy
- Calabria > Catanzaro Province
- Catanzaro (0.04)
- Lombardy > Milan (0.04)
- Calabria > Catanzaro Province
- France
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- East Sussex > Brighton (0.04)
- Greater London > London (0.04)
- Surrey > Guildford (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Netherlands
- North Holland > Amsterdam (0.04)
- South Holland > Delft (0.04)
- Spain
- Andalusia > Málaga Province
- Málaga (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Galicia > Madrid (0.04)
- Andalusia > Málaga Province
- Germany
- Bavaria > Upper Bavaria
- Munich (0.04)
- Hamburg (0.04)
- Bavaria > Upper Bavaria
- Bulgaria > Varna Province
- Varna (0.04)
- Austria
- Sweden > Stockholm
- Stockholm (0.04)
- Ireland > Leinster
- North America
- Canada
- Puerto Rico > Peñuelas
- Peñuelas (0.04)
- United States
- New York > New York County
- New York City (0.04)
- California
- Los Angeles County > Long Beach (0.13)
- San Francisco County > San Francisco (0.13)
- Santa Clara County > Palo Alto (0.04)
- District of Columbia > Washington (0.04)
- Washington > King County
- Seattle (0.04)
- Rhode Island (0.04)
- Tennessee (0.04)
- Michigan (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.13)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Maryland > Baltimore (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.13)
- New York > New York County
- Oceania > New Zealand
- North Island > Auckland Region > Auckland (0.04)
- South America
- Genre:
- Instructional Material (1.00)
- Overview > Innovation (0.67)
- Research Report
- Experimental Study (1.00)
- Promising Solution (1.00)
- Industry:
- Education > Curriculum
- Subject-Specific Education (0.67)
- Government (1.00)
- Health & Medicine
- Consumer Health (0.92)
- Therapeutic Area
- Neurology (1.00)
- Psychiatry/Psychology > Mental Health (0.67)
- Information Technology > Security & Privacy (0.92)
- Law
- Intellectual Property & Technology Law (1.00)
- Statutes (0.67)
- Leisure & Entertainment (1.00)
- Media > Music (1.00)
- Education > Curriculum
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (1.00)
- Machine Learning
- Inductive Learning (0.92)
- Learning Graphical Models > Undirected Networks
- Markov Models (0.67)
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (1.00)
- Natural Language
- Chatbot (1.00)
- Large Language Model (1.00)
- Text Processing (0.92)
- Representation & Reasoning
- Agents (1.00)
- Personal Assistant Systems (1.00)
- Speech
- Acoustic Processing (1.00)
- Speech Recognition (1.00)
- Information Technology > Artificial Intelligence