AITopics

2406.00976

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Nachmani, Eliya, Levkovitch, Alon, Hirsch, Roy, Salazar, Julian, Asawaroengchai, Chulayuth, Mariooryad, Soroosh, Rivlin, Ehud, Skerry-Ryan, RJ, Ramanovich, Michelle Tadmor

Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM

arXiv.org Artificial IntelligenceOct-20-2023

We present a novel approach to adapting pre-trained large language models (LLMs) to perform question answering (QA) and speech continuation. By endowing the LLM with a pre-trained speech encoder, our model becomes able to take speech inputs and generate speech outputs. The entire system is trained end-to-end and operates directly on spectrograms, simplifying our architecture. Key to our approach is a training objective that jointly supervises speech recognition, text continuation, and speech synthesis using only paired speech-text pairs, enabling a `cross-modal' chain-of-thought within a single decoding pass. Our method surpasses existing spoken language models in speaker preservation and semantic coherence. Furthermore, the proposed model improves upon direct initialization in retaining the knowledge of the original LLM as demonstrated through spoken QA datasets. Audio samples can be found at https://michelleramanovich.github.io/spectron/spectron

arxiv preprint arxiv, continuation, language model, (12 more...)

2305.15255

Country: Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJul-25-2023

AudioLM: a Language Modeling Approach to Audio Generation

Borsos, Zalán, Marinier, Raphaël, Vincent, Damien, Kharitonov, Eugene, Pietquin, Olivier, Sharifi, Matt, Roblek, Dominik, Teboul, Olivier, Grangier, David, Tagliasacchi, Marco, Zeghidour, Neil

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, we leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis. By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts. When trained on speech, and without any transcript or annotation, AudioLM generates syntactically and semantically plausible speech continuations while also maintaining speaker identity and prosody for unseen speakers. Furthermore, we demonstrate how our approach extends beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music.

artificial intelligence, machine learning, natural language, (21 more...)

2209.03143

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.61)

Borsos, Zalán, Sharifi, Matt, Vincent, Damien, Kharitonov, Eugene, Zeghidour, Neil, Tagliasacchi, Marco

SoundStorm: Efficient Parallel Audio Generation

arXiv.org Artificial IntelligenceMay-16-2023

We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a neural audio codec. Compared to the autoregressive generation approach of AudioLM, our model produces audio of the same quality and with higher consistency in voice and acoustic conditions, while being two orders of magnitude faster. SoundStorm generates 30 seconds of audio in 0.5 seconds on a TPU-v4. We demonstrate the ability of our model to scale audio generation to longer sequences by synthesizing high-quality, natural dialogue segments, given a transcript annotated with speaker turns and a short prompt with the speakers' voices.

artificial intelligence, machine learning, natural language, (20 more...)

2305.09636

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.46)

arXiv.org Artificial IntelligenceJan-29-2023

SingSong: Generating musical accompaniments from singing

Donahue, Chris, Caillon, Antoine, Roberts, Adam, Manilow, Ethan, Esling, Philippe, Agostinelli, Andrea, Verzetti, Mauro, Simon, Ian, Pietquin, Olivier, Zeghidour, Neil, Engel, Jesse

We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build on recent developments in musical source separation and audio generation. Specifically, we apply a state-of-the-art source separation algorithm to a large corpus of music audio to produce aligned pairs of vocals and instrumental sources. Then, we adapt AudioLM (Borsos et al., 2022) -- a state-of-the-art approach for unconditional audio generation -- to be suitable for conditional "audio-to-audio" generation tasks, and train it on the source-separated (vocal, instrumental) pairs. In a pairwise comparison with the same vocal inputs, listeners expressed a significant preference for instrumentals generated by SingSong compared to those from a strong retrieval baseline. Sound examples at https://g.co/magenta/singsong

accompaniment, machine learning, natural language, (18 more...)

2301.12662

Genre: Research Report (0.84)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceJan-26-2023

MusicLM: Generating Music From Text

Agostinelli, Andrea, Denk, Timo I., Borsos, Zalán, Engel, Jesse, Verzetti, Mauro, Caillon, Antoine, Huang, Qingqing, Jansen, Aren, Roberts, Adam, Tagliasacchi, Marco, Sharifi, Matt, Zeghidour, Neil, Frank, Christian

We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption. To support future research, we publicly release MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.

music, musiclm, text description, (14 more...)

2301.11325

Country:

Asia (0.04)
Europe > Austria (0.04)
Africa (0.04)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

#artificialintelligenceNov-9-2022, 02:40:15 GMT

Google's three transformative areas of AI

YEARS of research have led to rapid progress in Artificial Intelligence (AI). On November 2, Google announced three ways people are poised to benefit from the advancements in AI. Jeff Dean, senior vice president of Google Research and Health, presented three transformative areas of AI: first, using AI to make technology accessible in many more languages; second, exploring how AI might bolster creativity; and third, AI for social good, including climate adaptation. The 1,000 Languages Initiative is an ambitious research project to build an AI model that would support the 1,000 most spoken languages of the world. In order to provide AI-based language technology for the world, they need to make sure they also train their models on representative content of the world.

climate change, google, transformative area, (7 more...)

Country:

Oceania > Australia (0.05)
North America > United States (0.05)
North America > Mexico (0.05)
(3 more...)

Industry:

Social Sector (0.56)
Health & Medicine (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

#artificialintelligenceOct-15-2022, 01:10:10 GMT

Google's Audiolm: Generating Music by Hearing a Song's Snippet

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. AudioLM is Google's new model, capable of generating music in the same style as the prompt.

audiolm, google, music, (13 more...)

Industry: Media > Music (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.30)

#artificialintelligenceOct-7-2022, 17:40:10 GMT

Google's new AI can hear a snippet of song--and then keep on playing

AI-generated audio is commonplace: voices on home assistants like Alexa use natural language processing. AI music systems like OpenAI's Jukebox have already generated impressive results, but most existing techniques need people to prepare transcriptions and label text-based training data, which takes a lot of time and human labor. Jukebox, for example, uses text-based data to generate song lyrics. AudioLM, described in a non-peer-reviewed paper last month, is different: it doesn't require transcription or labeling. Instead, sound databases are fed into the program, and machine learning is used to compress the audio files into sound snippets, called "tokens," without losing too much information.

google, snippet, use natural language processing, (5 more...)

AI-Alerts: 2022 > 2022-10 > AAAI AI-Alert for Oct 11, 2022 (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.43)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.43)

#artificialintelligenceOct-7-2022, 15:39:48 GMT

AudioLM: a Language Modeling Approach to Audio Generation

Posted by Zalán Borsos, Research Software Engineer, and Neil Zeghidour, Research Scientist, Google Research Generating realistic audio re...

audio generation, audiolm, language modeling approach

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)