AITopics | Lee, Junwon

Collaborating Authors

Lee, Junwon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation

Chung, Yoonjin, Eu, Pilsun, Lee, Junwon, Choi, Keunwoo, Nam, Juhan, Chon, Ben Sangbae

arXiv.org Artificial IntelligenceMar-9-2025

Although being widely adopted for evaluating generated audio signals, the Fr\'echet Audio Distance (FAD) suffers from significant limitations, including reliance on Gaussian assumptions, sensitivity to sample size, and high computational complexity. As an alternative, we introduce the Kernel Audio Distance (KAD), a novel, distribution-free, unbiased, and computationally efficient metric based on Maximum Mean Discrepancy (MMD). Through analysis and empirical validation, we demonstrate KAD's advantages: (1) faster convergence with smaller sample sizes, enabling reliable evaluation with limited data; (2) lower computational cost, with scalable GPU acceleration; and (3) stronger alignment with human perceptual judgments. By leveraging advanced embeddings and characteristic kernels, KAD captures nuanced differences between real and generated audio. Open-sourced in the kadtk toolkit, KAD provides an efficient, reliable, and perceptually aligned benchmark for evaluating generative audio models.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.15602

Country:

Europe (1.00)
Asia > China (0.69)
Asia > South Korea (0.46)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Sound Scene Synthesis at the DCASE 2024 Challenge

Lagrange, Mathieu, Lee, Junwon, Tailleur, Modan, Heller, Laurie M., Choi, Keunwoo, McFee, Brian, Imoto, Keisuke, Okamoto, Yuki

arXiv.org Artificial IntelligenceJan-15-2025

This paper presents Task 7 at the DCASE 2024 Challenge: sound scene synthesis. Recent advances in sound synthesis and generative models have enabled the creation of realistic and diverse audio content. We introduce a standardized evaluation framework for comparing different sound scene synthesis systems, incorporating both objective and subjective metrics. The challenge attracted four submissions, which are evaluated using the Fr\'echet Audio Distance (FAD) and human perceptual ratings. Our analysis reveals significant insights into the current capabilities and limitations of sound scene synthesis systems, while also highlighting areas for future improvement in this rapidly evolving field.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.08587

Country:

Asia (0.95)
Europe (0.69)
North America > United States (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation

Lee, Junwon, Tailleur, Modan, Heller, Laurie M., Choi, Keunwoo, Lagrange, Mathieu, McFee, Brian, Imoto, Keisuke, Okamoto, Yuki

arXiv.org Artificial IntelligenceOct-23-2024

Despite significant advancements in neural text-to-audio generation, challenges persist in controllability and evaluation. This paper addresses these issues through the Sound Scene Synthesis challenge held as part of the Detection and Classification of Acoustic Scenes and Events 2024. We present an evaluation protocol combining objective metric, namely Fr\'echet Audio Distance, with perceptual assessments, utilizing a structured prompt format to enable diverse captions and effective evaluation. Our analysis reveals varying performance across sound categories and model architectures, with larger models generally excelling but innovative lightweight approaches also showing promise. The strong correlation between objective metrics and human ratings validates our evaluation approach. We discuss outcomes in terms of audio quality, controllability, and architectural considerations for text-to-audio synthesizers, providing direction for future research.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.17589

Country:

Asia > China (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis

Chung, Yoonjin, Lee, Junwon, Nam, Juhan

arXiv.org Artificial IntelligenceJan-17-2024

Foley sound, audio content inserted synchronously with videos, plays a critical role in the user experience of multimedia content. Recently, there has been active research in Foley sound synthesis, leveraging the advancements in deep generative models. However, such works mainly focus on replicating a single sound class or a textual sound description, neglecting temporal information, which is crucial in the practical applications of Foley sound. We present T-Foley, a Temporal-event-guided waveform generation model for Foley sound synthesis. T-Foley generates high-quality audio using two conditions: the sound class and temporal event feature. For temporal conditioning, we devise a temporal event feature and a novel conditioning technique named Block-FiLM. T-Foley achieves superior performance in both objective and subjective evaluation metrics and generates Foley sound well-synchronized with the temporal events. Additionally, we showcase T-Foley's practical applications, particularly in scenarios involving vocal mimicry for temporal event control. We show the demo on our companion website.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.09294

Genre: Research Report > New Finding (0.46)

Industry: Media (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance

Yoo, Yongmin, Jeong, Cheonkam, Gim, Sanguk, Lee, Junwon, Schimke, Zachary, Seo, Deaho

arXiv.org Artificial IntelligenceNov-30-2023

Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement. Nonetheless, this analysis is predominantly conducted manually by legal experts, often resulting in a time-consuming process. Recent advances in natural language processing technology offer a promising avenue for automating this process. However, methods for measuring similarity between patents still rely on experts manually classifying patents. Due to the recent development of artificial intelligence technology, a lot of research is being conducted focusing on the semantic similarity of patents using natural language processing technology. However, it is difficult to accurately analyze patent data, which are legal documents representing complex technologies, using existing natural language processing technologies. To address these limitations, we propose a hybrid methodology that takes into account bibliographic similarity, measures the similarity between patents by considering the semantic similarity of patents, the technical similarity between patents, and the bibliographic information of patents. Using natural language processing techniques, we measure semantic similarity based on patent text and calculate technical similarity through the degree of coexistence of International patent classification (IPC) codes. The similarity of bibliographic information of a patent is calculated using the special characteristics of the patent: citation information, inventor information, and assignee information. We propose a model that assigns reasonable weights to each similarity method considered. With the help of experts, we performed manual similarity evaluations on 420 pairs and evaluated the performance of our model based on this data. We have empirically shown that our method outperforms recent natural language processing techniques.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.16767

Country: North America > United States > Arizona > Pima County > Tucson (0.28)

Genre: Research Report (0.64)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Music Playlist Title Generation Using Artist Information

Kim, Haven, Doh, SeungHeon, Lee, Junwon, Nam, Juhan

arXiv.org Artificial IntelligenceJan-13-2023

Automatically generating or captioning music playlist titles given a set of tracks is of significant interest in music streaming services as customized playlists are widely used in personalized music recommendation, and well-composed text titles attract users and help their music discovery. We present an encoder-decoder model that generates a playlist title from a sequence of music tracks. While previous work takes track IDs as tokenized input for playlist title generation, we use artist IDs corresponding to the tracks to mitigate the issue from the long-tail distribution of tracks included in the playlist dataset. Also, we introduce a chronological data split method to deal with newly-released tracks in real-world scenarios. Comparing the track IDs and artist IDs as input sequences, we show that the artist-based approach significantly enhances the performance in terms of word overlap, semantic relevance, and diversity.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2301.08145

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)

Add feedback