Foundation Models for Music: A Survey

Ma, Yinghao, Øland, Anders, Ragni, Anton, Del Sette, Bleiz MacSen, Saitis, Charalampos, Donahue, Chris, Lin, Chenghua, Plachouras, Christos, Benetos, Emmanouil, Shatri, Elona, Morreale, Fabio, Zhang, Ge, Fazekas, György, Xia, Gus, Zhang, Huan, Manco, Ilaria, Huang, Jiawen, Guinot, Julien, Lin, Liwei, Marinelli, Luca, Lam, Max W. Y., Sharma, Megha, Kong, Qiuqiang, Dannenberg, Roger B., Yuan, Ruibin, Wu, Shangda, Wu, Shih-Lun, Dai, Shuqi, Lei, Shun, Kang, Shiyin, Dixon, Simon, Chen, Wenhu, Huang, Wenhao, Du, Xingjian, Qu, Xingwei, Tan, Xu, Li, Yizhi, Tian, Zeyue, Wu, Zhiyong, Wu, Zhizheng, Ma, Ziyang, Wang, Ziyu

Sep-3-2024–arXiv.org Artificial Intelligence

In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the significance of music in various industries and trace the evolution of AI in music. By delineating the modalities targeted by foundation models, we discover many of the music representations are underexplored in FM development. Then, emphasis is placed on the lack of versatility of previous methods on diverse music applications, along with the potential of FMs in music understanding, generation and medical application. By comprehensively exploring the details of the model pre-training paradigm, architectural choices, tokenisation, finetuning methodologies and controllability, we emphasise the important topics that should have been well explored, like instruction tuning and in-context learning, scaling law and emergent ability, as well as long-sequence modelling etc. A dedicated section presents insights into music agents, accompanied by a thorough analysis of datasets and evaluations essential for pre-training and downstream tasks. Finally, by underscoring the vital importance of ethical considerations, we advocate that following research on FM for music should focus more on such issues as interpretability, transparency, human responsibility, and copyright issues. The paper offers insights into future challenges and trends on FMs for music, aiming to shape the trajectory of human-AI collaboration in the music realm.

audio-visual joint representation learning, pattern analysis and machine intelligence, tsinghua shenzhen international graduate school, (16 more...)

arXiv.org Artificial Intelligence

Sep-3-2024

arXiv.org PDF

Add feedback

Country:
- South America
  - Colombia > Bolivar Department
    - Cartagena (0.04)
  - Chile > Santiago Metropolitan Region
    - Santiago Province > Santiago (0.04)
- Oceania > New Zealand
  - North Island > Auckland Region > Auckland (0.04)
- North America
  - United States
    - Rhode Island (0.04)
    - Maryland > Baltimore (0.04)
    - Michigan (0.04)
    - Tennessee (0.04)
    - District of Columbia > Washington (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.13)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.13)
    - Illinois > Cook County
      - Chicago (0.04)
    - Washington > King County
      - Seattle (0.04)
    - California
      - San Francisco County > San Francisco (0.13)
      - Los Angeles County > Long Beach (0.13)
      - Santa Clara County > Palo Alto (0.04)
    - New York > New York County
      - New York City (0.04)
  - Puerto Rico > Peñuelas
    - Peñuelas (0.04)
  - Canada
    - Ontario > Toronto (0.14)
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.13)
    - Alberta
      - Census Division No. 6 > Calgary Metropolitan Region
        Calgary (0.04)
      - Census Division No. 15 > Improvement District No. 9
        Banff (0.04)
- Europe
  - Greece (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Austria
    - Vienna (0.13)
    - Styria > Graz (0.04)
  - Bulgaria > Varna Province
    - Varna (0.04)
  - Germany
    - Hamburg (0.04)
    - Bavaria > Upper Bavaria
      - Munich (0.04)
  - Spain
    - Galicia > Madrid (0.04)
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
    - Andalusia > Málaga Province
      - Málaga (0.04)
  - Netherlands
    - South Holland > Delft (0.04)
    - North Holland > Amsterdam (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - United Kingdom > England
    - East Sussex > Brighton (0.04)
    - Surrey > Guildford (0.04)
    - Greater London > London (0.04)
    - Cambridgeshire > Cambridge (0.04)
  - France
    - Île-de-France > Paris
      - Paris (0.04)
    - Auvergne-Rhône-Alpes
      - Lyon > Lyon (0.04)
      - Isère > Grenoble (0.04)
  - Italy
    - Lombardy > Milan (0.04)
    - Calabria > Catanzaro Province
      - Catanzaro (0.04)
  - Portugal > Braga
    - Braga (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - South Korea
    - Incheon > Incheon (0.04)
    - Seoul > Seoul (0.04)
  - China
    - Guangdong Province > Shenzhen (0.04)
    - Hong Kong (0.04)
    - Shanghai > Shanghai (0.04)
    - Beijing > Beijing (0.04)
  - Middle East
    - Jordan (0.04)
    - Israel > Tel Aviv District
      - Tel Aviv (0.04)
  - India > Karnataka
    - Bengaluru (0.04)
  - Malaysia > Kuala Lumpur
    - Kuala Lumpur (0.04)
  - Japan > Honshū
    - Kantō
      - Tokyo Metropolis Prefecture > Tokyo (0.04)
      - Kanagawa Prefecture > Yokohama (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Instructional Material (1.00)
- Overview > Innovation (0.67)
- Research Report
  - Promising Solution (1.00)
  - Experimental Study (1.00)

Industry:
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Government (1.00)
- Information Technology > Security & Privacy (0.92)
- Law
  - Intellectual Property & Technology Law (1.00)
  - Statutes (0.67)
- Health & Medicine
  - Consumer Health (0.92)
  - Therapeutic Area
    - Neurology (1.00)
    - Psychiatry/Psychology > Mental Health (0.67)
- Education > Curriculum
  - Subject-Specific Education (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Issues > Social & Ethical Issues (1.00)
  - Speech
    - Speech Recognition (1.00)
    - Acoustic Processing (1.00)
  - Representation & Reasoning
    - Personal Assistant Systems (1.00)
    - Agents (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
    - Text Processing (0.92)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)
    - Inductive Learning (0.92)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found