Goto

Collaborating Authors

 jukebox


A Study on the Data Distribution Gap in Music Emotion Recognition

Ching, Joann, Widmer, Gerhard

arXiv.org Artificial Intelligence

Music Emotion Recognition (MER) is a task deeply connected to human perception, relying heavily on subjective annotations collected from contributors. Prior studies tend to focus on specific musical styles rather than incorporating a diverse range of genres, such as rock and classical, within a single framework. In this paper, we address the task of recognizing emotion from audio content by investigating five datasets with dimensional emotion annotations -- EmoMusic, DEAM, PMEmo, WTC, and WCMED -- which span various musical styles. We demonstrate the problem of out-of-distribution generalization in a systematic experiment. By closely looking at multiple data and feature sets, we provide insight into genre-emotion relationships in existing data and examine potential genre dominance and dataset biases in certain feature representations. Based on these experiments, we arrive at a simple yet effective framework that combines embeddings extracted from the Jukebox model with chroma features and demonstrate how, alongside a combination of several diverse training sets, this permits us to train models with substantially improved cross-dataset generalization capabilities.


A Novel Audio Representation for Music Genre Identification in MIR

Kamuni, Navin, Jindal, Mayank, Soni, Arpita, Mallreddy, Sukender Reddy, Macha, Sharath Chandra

arXiv.org Artificial Intelligence

For Music Information Retrieval downstream tasks, the most common audio representation is time-frequency-based, such as Mel spectrograms. In order to identify musical genres, this study explores the possibilities of a new form of audio representation one of the most usual MIR downstream tasks. Therefore, to discretely encoding music using deep vector quantization; a novel audio representation was created for the innovative generative music model i.e. Jukebox. The effectiveness of Jukebox's audio representation is compared to Mel spectrograms using a dataset that is almost equivalent to State-of-the-Art (SOTA) and an almost same transformer design. The results of this study imply that, at least when the transformers are pretrained using a very modest dataset of 20k tracks, Jukebox's audio representation is not superior to Mel spectrograms. This could be explained by the fact that Jukebox's audio representation does not sufficiently take into account the peculiarities of human hearing perception. On the other hand, Mel spectrograms are specifically created with the human auditory sense in mind.


Jukebox

#artificialintelligence

This has led to impressive results like producing Bach chorals,[ reference-5][ reference-6] polyphonic music with multiple instruments,[ reference-7][ reference-8][ reference-9] as well as minute long musical pieces.[ But symbolic generators have limitations--they cannot capture human voices or many of the more subtle timbres, dynamics, and expressivity that are essential to music. A different approach[ footnote-approach] is to model music directly as raw audio.[ For comparison, GPT-2 had 1,000 timesteps and OpenAI Five took tens of thousands of timesteps per game. Thus, to learn the high level semantics of music, a model would have to deal with extremely long-range dependencies.


Unleashing the Power of AI in Music: A Deep Dive into Jukebox by OpenAI

#artificialintelligence

Jukebox, an innovative AI system created by OpenAI, leverages the power of deep learning to generate music, complete with lyrics and vocals, in a variety of genres and styles. By training on a dataset of 1.2 million songs, Jukebox showcases an unparalleled level of sophistication in music generation, pushing the boundaries of what AI can achieve in the creative arts. At the core of Jukebox lies a cutting-edge neural network architecture, known as a Variational Autoencoder (VAE). The VAE's role is to encode and decode the complex musical information found within the training dataset. This encoding-decoding process enables Jukebox to generate novel and diverse musical compositions by sampling from the latent space, a mathematical representation of the underlying structure of the dataset.


Melody transcription via generative pre-training

Donahue, Chris, Thickstun, John, Liang, Percy

arXiv.org Artificial Intelligence

Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by $20$% relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music. The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline. By pairing our new melody transcription approach with solutions for beat detection, key estimation, and chord recognition, we build Sheet Sage, a system capable of transcribing human-readable lead sheets directly from music audio. Audio examples can be found at https://chrisdonahue.com/sheetsage and code at https://github.com/chrisdonahue/sheetsage .


AI music generators could be a boon for artists -- but also problematic

#artificialintelligence

It was only five years ago that electronic punk band YACHT entered the recording studio with a daunting task: They would train an AI on 14 years of their music, then synthesize the results into the album "Chain Tripping." "I'm not interested in being a reactionary," YACHT member and tech writer Claire L. Evans said in a documentary about the album. "I don't want to return to my roots and play acoustic guitar because I'm so freaked out about the coming robot apocalypse, but I also don't want to jump into the trenches and welcome our new robot overlords either." But our new robot overlords are making a whole lot of progress in the space of AI music generation. Even though the Grammy-nominated "Chain Tripping" was released in 2019, the technology behind it is already becoming outdated.


AI music generators could be a boon for artists -- but also problematic

#artificialintelligence

It was only five years ago that electronic punk band YACHT entered the recording studio with a daunting task: They would train an AI on 14 years of their music, then synthesize the results into the album "Chain Tripping." "I'm not interested in being a reactionary," YACHT member and tech writer Claire L. Evans said in a documentary about the album. "I don't want to return to my roots and play acoustic guitar because I'm so freaked out about the coming robot apocalypse, but I also don't want to jump into the trenches and welcome our new robot overlords either." But our new robot overlords are making a whole lot of progress in the space of AI music generation. Even though the Grammy-nominated "Chain Tripping" was released in 2019, the technology behind it is already becoming outdated.


Google's new AI can hear a snippet of song--and then keep on playing

#artificialintelligence

AI-generated audio is commonplace: voices on home assistants like Alexa use natural language processing. AI music systems like OpenAI's Jukebox have already generated impressive results, but most existing techniques need people to prepare transcriptions and label text-based training data, which takes a lot of time and human labor. Jukebox, for example, uses text-based data to generate song lyrics. AudioLM, described in a non-peer-reviewed paper last month, is different: it doesn't require transcription or labeling. Instead, sound databases are fed into the program, and machine learning is used to compress the audio files into sound snippets, called "tokens," without losing too much information.


Transfer Learning with Jukebox for Music Source Separation

Amri, W. Zai El, Tautz, O., Ritter, H., Melnik, A.

arXiv.org Artificial Intelligence

In this work, we demonstrate how a publicly available, pre-trained Jukebox model can be adapted for the problem of audio source separation from a single mixed audio channel. Our neural network architecture, which is using transfer learning, is quick to train and the results demonstrate performance comparable to other state-of-the-art approaches that require a lot more compute resources, training data, and time. We provide an open-source code implementation of our architecture (https://github.com/wzaielamri/unmix)


Do We Rage Against the AI Machine?

#artificialintelligence

The Industrial Revolution was a time of great change. With the steam engine, industries shifted away from skilled human labour towards mechanisation and machinery. As a result, many specialised workers lost their jobs and were forced to adapt to their new reality. The Luddites, a radical organisation of textile workers who were made redundant by textile machines, retaliated by destroying these machines and assassinated business owners. The Luddites gained public sympathy as many were afraid that they, like the retrenched textile workers, would lose their jobs to automated machinery.