AITopics | Wu, Yusong

Collaborating Authors

Wu, Yusong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers

Scarlatos, Alexander, Wu, Yusong, Simon, Ian, Roberts, Adam, Cooijmans, Tim, Jaques, Natasha, Tarakajian, Cassie, Huang, Cheng-Zhi Anna

arXiv.org Artificial IntelligenceFeb-28-2025

Recent advances in generative artificial intelligence (AI) have created models capable of high-quality musical content generation. However, little consideration is given to how to use these models for real-time or cooperative jamming musical applications because of crucial required features: low latency, the ability to communicate planned actions, and the ability to adapt to user input in real-time. To support these needs, we introduce ReaLJam, an interface and protocol for live musical jamming sessions between a human and a Transformer-based AI agent trained with reinforcement learning. We enable real-time interactions using the concept of anticipation, where the agent continually predicts how the performance will unfold and visually conveys its plan to the user. We conduct a user study where experienced musicians jam in real-time with the agent through ReaLJam. Our results demonstrate that ReaLJam enables enjoyable and musically interesting sessions, and we uncover important takeaways for future work.

machine learning, natural language, real time system, (18 more...)

arXiv.org Artificial Intelligence

2502.21267

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

Manco, Ilaria, Weck, Benno, Doh, SeungHeon, Won, Minz, Zhang, Yixiao, Bogdanov, Dmitry, Wu, Yusong, Chen, Ke, Tovstogan, Philip, Benetos, Emmanouil, Quinton, Elio, Fazekas, György, Nam, Juhan

arXiv.org Artificial IntelligenceNov-22-2023

We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.10057

Country:

Europe (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

Chen, Ke, Wu, Yusong, Liu, Haohe, Nezhurina, Marianna, Berg-Kirkpatrick, Taylor, Dubnov, Shlomo

arXiv.org Artificial IntelligenceAug-3-2023

Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, to address the limitations of training data and to avoid plagiarism, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, which recombine training audio directly or via a latent embeddings space, respectively. Such mixup strategies encourage the model to interpolate between musical training samples and generate new music within the convex hull of the training data, making the generated music more diverse while still staying faithful to the corresponding style. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music, as well as the correspondence between input text and generated music.

artificial intelligence, machine learning, musicldm, (16 more...)

arXiv.org Artificial Intelligence

2308.01546

Country:

Europe (0.28)
North America > Canada > Quebec (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback