AITopics | Chae, Yunkee

Collaborating Authors

Chae, Yunkee

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control

Chae, Yunkee, Shin, Eunsik, Suntae, Hwang, Paik, Seungryeol, Lee, Kyogu

arXiv.org Artificial IntelligenceNov-20-2024

Lyrics generation presents unique challenges, particularly in achieving precise syllable control while adhering to song form structures such as verses and choruses. Conventional line-by-line approaches often lead to unnatural phrasing, underscoring the need for more granular syllable management. We propose a framework for lyrics generation that enables multi-level syllable control at the word, phrase, line, and paragraph levels, aware of song form. Our approach generates complete lyrics conditioned on input text and song form, ensuring alignment with specified syllable constraints. Generated lyrics samples are available at: https://tinyurl.com/lyrics9999

large language model, lyric, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.131

Country: Asia > China (0.14)

Genre: Research Report (0.84)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression

Chae, Yunkee, Choi, Woosung, Takida, Yuhta, Koo, Junghyun, Ikemiya, Yukara, Zhong, Zhi, Cheuk, Kin Wai, Martínez-Ramírez, Marco A., Lee, Kyogu, Liao, Wei-Hsiang, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceOct-12-2024

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly in scenarios with simple input audio, such as silence. To address this limitation, we propose variable bitrate RVQ (VRVQ) for audio codecs, which allows for more efficient coding by adapting the number of codebooks used per frame. Furthermore, we propose a gradient estimation method for the non-differentiable masking operation that transforms from the importance map to the binary importance mask, improving model training via a straight-through estimator. We demonstrate that the proposed training framework achieves superior results compared to the baseline method and shows further improvement when applied to the current state-of-the-art codec.

artificial intelligence, importance map, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.06016

Country: Asia > Japan (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Exploiting Time-Frequency Conformers for Music Audio Enhancement

Chae, Yunkee, Koo, Junghyun, Lee, Sungho, Lee, Kyogu

arXiv.org Artificial IntelligenceAug-24-2023

With the proliferation of video platforms on the internet, recording musical performances by mobile devices has become commonplace. However, these recordings often suffer from degradation such as noise and reverberation, which negatively impact the listening experience. Consequently, the necessity for music audio enhancement (referred to as music enhancement from this point onward), involving the transformation of degraded audio recordings into pristine high-quality music, has surged to augment the auditory experience. To address this issue, we propose a music enhancement system based on the Conformer architecture that has demonstrated outstanding performance in speech enhancement tasks. Our approach explores the attention mechanisms of the Conformer and examines their performance to discover the best approach for the music enhancement task. Our experimental results show that our proposed model achieves state-of-the-art performance on single-stem music enhancement. Furthermore, our system can perform general music enhancement with multi-track mixtures, which has not been examined in previous work.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.12599

Country:

Asia (0.47)
North America > Canada (0.16)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data

Koo, Junghyun, Chae, Yunkee, Jeon, Chang-Bin, Lee, Kyogu

arXiv.org Artificial IntelligenceJul-24-2023

Music source separation (MSS) faces challenges due to the limited availability of correctly-labeled individual instrument tracks. With the push to acquire larger datasets to improve MSS performance, the inevitability of encountering mislabeled individual instrument tracks becomes a significant challenge to address. This paper introduces an automated technique for refining the labels in a partially mislabeled dataset. Our proposed self-refining technique, employed with a noisy-labeled dataset, results in only a 1% accuracy degradation in multi-label instrument recognition compared to a classifier trained on a clean-labeled dataset. The study demonstrates the importance of refining noisy-labeled data in MSS model training and shows that utilizing the refined dataset leads to comparable results derived from a clean-labeled dataset. Notably, upon only access to a noisy dataset, MSS models trained on a self-refined dataset even outperform those trained on a dataset refined with a classifier trained on clean labels.

artificial intelligence, dataset, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2307.12576

Country: Europe > Italy (0.14)

Genre: Research Report (0.64)

Industry:

Media > Music (0.70)
Leisure & Entertainment (0.70)
Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

Kim, Eungbeom, Chae, Yunkee, Sim, Jaeheon, Lee, Kyogu

arXiv.org Artificial IntelligenceJun-27-2023

Automatic speech recognition systems based on deep learning are mainly trained under empirical risk minimization (ERM). Since ERM utilizes the averaged performance on the data samples regardless of a group such as healthy or dysarthric speakers, ASR systems are unaware of the performance disparities across the groups. This results in biased ASR systems whose performance differences among groups are severe. In this study, we aim to improve the ASR system in terms of group robustness for dysarthric speakers. To achieve our goal, we present a novel approach, sample reweighting with sample affinity test (Re-SAT). Re-SAT systematically measures the debiasing helpfulness of the given data sample and then mitigates the bias by debiasing helpfulness-based sample reweighting. Experimental results demonstrate that Re-SAT contributes to improved ASR performance on dysarthric speech without performance degradation on healthy speech.

artificial intelligence, bias-conflicting sample, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

2305.13108

Genre:

Research Report > Promising Solution (0.88)
Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback