AITopics | Choi, Keunwoo

Collaborating Authors

Choi, Keunwoo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation

Chung, Yoonjin, Eu, Pilsun, Lee, Junwon, Choi, Keunwoo, Nam, Juhan, Chon, Ben Sangbae

arXiv.org Artificial IntelligenceMar-9-2025

Although being widely adopted for evaluating generated audio signals, the Fr\'echet Audio Distance (FAD) suffers from significant limitations, including reliance on Gaussian assumptions, sensitivity to sample size, and high computational complexity. As an alternative, we introduce the Kernel Audio Distance (KAD), a novel, distribution-free, unbiased, and computationally efficient metric based on Maximum Mean Discrepancy (MMD). Through analysis and empirical validation, we demonstrate KAD's advantages: (1) faster convergence with smaller sample sizes, enabling reliable evaluation with limited data; (2) lower computational cost, with scalable GPU acceleration; and (3) stronger alignment with human perceptual judgments. By leveraging advanced embeddings and characteristic kernels, KAD captures nuanced differences between real and generated audio. Open-sourced in the kadtk toolkit, KAD provides an efficient, reliable, and perceptually aligned benchmark for evaluating generative audio models.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.15602

Country:

Europe (1.00)
Asia > China (0.69)
Asia > South Korea (0.46)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Sound Scene Synthesis at the DCASE 2024 Challenge

Lagrange, Mathieu, Lee, Junwon, Tailleur, Modan, Heller, Laurie M., Choi, Keunwoo, McFee, Brian, Imoto, Keisuke, Okamoto, Yuki

arXiv.org Artificial IntelligenceJan-15-2025

This paper presents Task 7 at the DCASE 2024 Challenge: sound scene synthesis. Recent advances in sound synthesis and generative models have enabled the creation of realistic and diverse audio content. We introduce a standardized evaluation framework for comparing different sound scene synthesis systems, incorporating both objective and subjective metrics. The challenge attracted four submissions, which are evaluated using the Fr\'echet Audio Distance (FAD) and human perceptual ratings. Our analysis reveals significant insights into the current capabilities and limitations of sound scene synthesis systems, while also highlighting areas for future improvement in this rapidly evolving field.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.08587

Country:

Asia (0.95)
Europe (0.69)
North America > United States (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation

Lee, Junwon, Tailleur, Modan, Heller, Laurie M., Choi, Keunwoo, Lagrange, Mathieu, McFee, Brian, Imoto, Keisuke, Okamoto, Yuki

arXiv.org Artificial IntelligenceOct-23-2024

Despite significant advancements in neural text-to-audio generation, challenges persist in controllability and evaluation. This paper addresses these issues through the Sound Scene Synthesis challenge held as part of the Detection and Classification of Acoustic Scenes and Events 2024. We present an evaluation protocol combining objective metric, namely Fr\'echet Audio Distance, with perceptual assessments, utilizing a structured prompt format to enable diverse captions and effective evaluation. Our analysis reveals varying performance across sound categories and model architectures, with larger models generally excelling but innovative lightweight approaches also showing promise. The strong correlation between objective metrics and human ratings validates our evaluation approach. We discuss outcomes in terms of audio quality, controllability, and architectural considerations for text-to-audio synthesizers, providing direction for future research.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.17589

Country:

Asia > China (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Demand-Driven Perspective on Generative Audio AI

Oh, Sangshin, Kang, Minsung, Moon, Hyeongi, Choi, Keunwoo, Chon, Ben Sangbae

arXiv.org Artificial IntelligenceJul-9-2023

To achieve successful deployment of AI research, it is crucial to understand the demands of the industry. In this paper, we present the results of a survey conducted with professional audio engineers, in order to determine research priorities and define various research tasks. We also summarize the current challenges in audio quality and controllability based on the survey. Our analysis emphasizes that the availability of datasets is currently the main bottleneck for achieving high-quality audio generation. Finally, we suggest potential solutions for some revealed issues with empirical evidence.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.04292

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Promising Solution (0.49)

Industry:

Leisure & Entertainment (0.94)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

Jeon, Chang-Bin, Moon, Hyeongi, Choi, Keunwoo, Chon, Ben Sangbae, Lee, Kyogu

arXiv.org Artificial IntelligenceMay-4-2023

Separation of multiple singing voices into each voice is a rarely studied area in music source separation research. The absence of a benchmark dataset has hindered its progress. In this paper, we present an evaluation dataset and provide baseline studies for multiple singing voices separation. First, we introduce MedleyVox, an evaluation dataset for multiple singing voices separation. We specify the problem definition in this dataset by categorizing it into i) unison, ii) duet, iii) main vs. rest, and iv) N-singing separation. Second, to overcome the absence of existing multi-singing datasets for a training purpose, we present a strategy for construction of multiple singing mixtures using various single-singing datasets. Third, we propose the improved super-resolution network (iSRNet), which greatly enhances initial estimates of separation networks. Jointly trained with the Conv-TasNet and the multi-singing mixture construction strategy, the proposed iSRNet achieved comparable performance to ideal time-frequency masks on duet and unison subsets of MedleyVox. Audio samples, the dataset, and codes are available on our website (https://github.com/jeonchangbin49/MedleyVox).

artificial intelligence, machine learning, separation, (19 more...)

arXiv.org Artificial Intelligence

2211.07302

Country:

Asia > South Korea (0.15)
Asia > Japan (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training

Cheuk, Kin Wai, Choi, Keunwoo, Kong, Qiuqiang, Li, Bochen, Won, Minz, Wang, Ju-Chiang, Hung, Yun-Ning, Herremans, Dorien

arXiv.org Artificial IntelligenceFeb-1-2023

In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results. The joint training of the transcription and source separation modules serves to improve the performance of both tasks. The instrument module is optional and can be directly controlled by human users. This makes Jointist a flexible user-controllable framework. Our challenging problem formulation makes the model highly useful in the real world given that modern popular music typically consists of multiple instruments. Its novelty, however, necessitates a new perspective on how to evaluate such a model. In our experiments, we assess the proposed model from various aspects, providing a new evaluation perspective for multi-instrument transcription. Our subjective listening study shows that Jointist achieves state-of-the-art performance on popular music, outperforming existing multi-instrument transcription models such as MT3. We conducted experiments on several downstream tasks and found that the proposed method improved transcription by more than 1 percentage points (ppt.), source separation by 5 SDR, downbeat detection by 1.8 ppt., chord recognition by 1.4 ppt., and key estimation by 1.4 ppt., when utilizing transcription results obtained from Jointist. Demo available at \url{https://jointist.github.io/Demo}.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.00286

Country:

Asia (0.28)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

A Proposal for Foley Sound Synthesis Challenge

Choi, Keunwoo, Oh, Sangshin, Kang, Minsung, McFee, Brian

arXiv.org Artificial IntelligenceJul-21-2022

We during post-production to enhance its perceived acoustic properties, review recent machine learning challenges in audio, speech, and e.g., by simulating the sounds of footsteps, ambient environmental music research in Section 2 and existing works and datasets in Section sounds, or visible objects on the screen. While foley is traditionally 3. In Section 4, we provide a proposal for foley sound synthesis produced by foley artists, there is increasing interest in automatic challenge that includes problem definition, datasets, and evaluation or machine-assisted techniques building upon recent advances in metrics. We conclude the paper in Section 5. sound synthesis and generative models. To foster more participation in this growing research area, we propose a challenge for automatic 2. CASE STUDY: RESEARCH CHALLENGES foley synthesis. Through case studies on successful previous challenges in audio and machine learning, we set the goals of In this section, we review five existing research challenges: Blizzard the proposed challenge: rigorous, unified, and efficient evaluation Challenge, CHiME, DCASE, Music Demixing challenge, and of different foley synthesis systems, with an overarching goal of AI Song Contest. The former three are relatively mature while the drawing active participation from the research community. We outline latter two started after 2020. All of them started along with the increasing the details and design considerations of a foley sound synthesis popularity of the research problems and have contributed challenge, including task definition, dataset requirements, and evaluation to the continued growth by defining the tasks, providing common criteria.

artificial intelligence, machine learning, survey article, (18 more...)

arXiv.org Artificial Intelligence

2207.1076

Genre:

Research Report (0.50)
Overview (0.48)

Industry:

Leisure & Entertainment (0.69)
Media > Music (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.91)
Information Technology > Artificial Intelligence > Speech (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Unsupervised Drum Transcription

Choi, Keunwoo, Cho, Kyunghyun

arXiv.org Artificial IntelligenceJun-28-2019

We introduce DrummerNet, a drum transcription system that is trained in an unsupervised manner. DrummerNet does not require any ground-truth transcription and, with the data-scalability of deep neural networks, learns from a large unlabeled dataset. In DrummerNet, the target drum signal is first passed to a (trainable) transcriber, then reconstructed in a (fixed) synthesizer according to the transcription estimate. By training the system to minimize the distance between the input and the output audio signals, the transcriber learns to transcribe without ground truth transcription. Our experiment shows that DrummerNet performs favorably compared to many other recent drum transcription systems, both supervised and unsupervised.

deep learning, neural network, transcription, (19 more...)

arXiv.org Artificial Intelligence

1906.03697

Country: Europe > Netherlands (0.28)

Genre: Research Report (0.82)

Industry:

Media > Music (0.47)
Leisure & Entertainment (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Music Captioning: Generating Music Playlist Descriptions

Choi, Keunwoo, Fazekas, George, McFee, Brian, Cho, Kyunghyun, Sandler, Mark

arXiv.org Artificial IntelligenceJan-15-2017

Descriptions are often provided along with recommendations to help users' discovery. Recommending automatically generated music playlists (e.g. personalised playlists) introduces the problem of generating descriptions. In this paper, we propose a method for generating music playlist descriptions, which is called as music captioning. In the proposed method, audio content analysis and natural language processing are adopted to utilise the information of each track.

deep learning, neural network, sequence, (17 more...)

arXiv.org Artificial Intelligence

1608.04868

Country:

Europe > Spain (0.15)
Asia > Middle East (0.15)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback