AITopics | Tailleur, Modan

Collaborating Authors

Tailleur, Modan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sound Scene Synthesis at the DCASE 2024 Challenge

Lagrange, Mathieu, Lee, Junwon, Tailleur, Modan, Heller, Laurie M., Choi, Keunwoo, McFee, Brian, Imoto, Keisuke, Okamoto, Yuki

arXiv.org Artificial IntelligenceJan-15-2025

This paper presents Task 7 at the DCASE 2024 Challenge: sound scene synthesis. Recent advances in sound synthesis and generative models have enabled the creation of realistic and diverse audio content. We introduce a standardized evaluation framework for comparing different sound scene synthesis systems, incorporating both objective and subjective metrics. The challenge attracted four submissions, which are evaluated using the Fr\'echet Audio Distance (FAD) and human perceptual ratings. Our analysis reveals significant insights into the current capabilities and limitations of sound scene synthesis systems, while also highlighting areas for future improvement in this rapidly evolving field.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.08587

Country:

Asia (0.95)
Europe (0.69)
North America > United States (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation

Lee, Junwon, Tailleur, Modan, Heller, Laurie M., Choi, Keunwoo, Lagrange, Mathieu, McFee, Brian, Imoto, Keisuke, Okamoto, Yuki

arXiv.org Artificial IntelligenceOct-23-2024

Despite significant advancements in neural text-to-audio generation, challenges persist in controllability and evaluation. This paper addresses these issues through the Sound Scene Synthesis challenge held as part of the Detection and Classification of Acoustic Scenes and Events 2024. We present an evaluation protocol combining objective metric, namely Fr\'echet Audio Distance, with perceptual assessments, utilizing a structured prompt format to enable diverse captions and effective evaluation. Our analysis reveals varying performance across sound categories and model architectures, with larger models generally excelling but innovative lightweight approaches also showing promise. The strong correlation between objective metrics and human ratings validates our evaluation approach. We discuss outcomes in terms of audio quality, controllability, and architectural considerations for text-to-audio synthesizers, providing direction for future research.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.17589

Country:

Asia > China (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

EMVD dataset: a dataset of extreme vocal distortion techniques used in heavy metal

Tailleur, Modan, Pinquier, Julien, Millot, Laurent, Vogel, Corsin, Lagrange, Mathieu

arXiv.org Artificial IntelligenceJun-24-2024

In this paper, we introduce the Extreme Metal Vocals Dataset, which comprises a collection of recordings of extreme vocal techniques performed within the realm of heavy metal music. The dataset consists of 760 audio excerpts of 1 second to 30 seconds long, totaling about 100 min of audio material, roughly composed of 60 minutes of distorted voices and 40 minutes of clear voice recordings. These vocal recordings are from 27 different singers and are provided without accompanying musical instruments or post-processing effects. The distortion taxonomy within this dataset encompasses four distinct distortion techniques and three vocal effects, all performed in different pitch ranges. Performance of a state-of-the-art deep learning model is evaluated for two different classification tasks related to vocal techniques, demonstrating the potential of this resource for the audio processing community.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.17732

Country:

Europe > France (0.31)
North America > United States > Ohio (0.14)

Genre: Research Report (0.40)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback