AITopics | Cheuk, Kin Wai

Collaborating Authors

Cheuk, Kin Wai

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression

Chae, Yunkee, Choi, Woosung, Takida, Yuhta, Koo, Junghyun, Ikemiya, Yukara, Zhong, Zhi, Cheuk, Kin Wai, Martínez-Ramírez, Marco A., Lee, Kyogu, Liao, Wei-Hsiang, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceOct-12-2024

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly in scenarios with simple input audio, such as silence. To address this limitation, we propose variable bitrate RVQ (VRVQ) for audio codecs, which allows for more efficient coding by adapting the number of codebooks used per frame. Furthermore, we propose a gradient estimation method for the non-differentiable masking operation that transforms from the importance map to the binary importance mask, improving model training via a straight-through estimator. We demonstrate that the proposed training framework achieves superior results compared to the baseline method and shows further improvement when applied to the current state-of-the-art codec.

artificial intelligence, importance map, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.06016

Country: Asia > Japan (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage

Tan, Hao Hao, Cheuk, Kin Wai, Cho, Taemin, Liao, Wei-Hsiang, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceMar-15-2024

This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA) token-based multi-instrument automatic music transcription (AMT) model. Despite SOTA performance, MT3 has the issue of instrument leakage, where transcriptions are fragmented across different instruments. To mitigate this, we propose MR-MT3, with enhancements including a memory retention mechanism, prior token sampling, and token shuffling are proposed. These methods are evaluated on the Slakh2100 dataset, demonstrating improved onset F1 scores and reduced instrument leakage. In addition to the conventional multi-instrument transcription F1 score, new metrics such as the instrument leakage ratio and the instrument detection F1 score are introduced for a more comprehensive assessment of transcription quality. The study also explores the issue of domain overfitting by evaluating MT3 on single-instrument monophonic datasets such as ComMU and NSynth. The findings, along with the source code, are shared to facilitate future work aimed at refining token-based multi-instrument AMT models.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2403.10024

Country: Asia > Japan (0.14)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

Cwitkowitz, Frank, Cheuk, Kin Wai, Choi, Woosung, Martínez-Ramírez, Marco A., Toyama, Keisuke, Liao, Wei-Hsiang, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceJan-24-2024

In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several works have explored multi-instrument transcription as a means to bolster the performance of models on low-resource tasks, but these methods face the same data availability issues. We propose Timbre-Trap, a novel framework which unifies music transcription and audio reconstruction by exploiting the strong separability between pitch and timbre. We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients, selecting between either output during the decoding stage via a simple switch mechanism. In this way, the model learns to produce coefficients corresponding to timbre-less audio, which can be interpreted as pitch salience. We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods, while only requiring a small amount of annotated data.

artificial intelligence, machine learning, transcription, (17 more...)

arXiv.org Artificial Intelligence

2309.15717

Country:

Asia (0.28)
North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training

Cheuk, Kin Wai, Choi, Keunwoo, Kong, Qiuqiang, Li, Bochen, Won, Minz, Wang, Ju-Chiang, Hung, Yun-Ning, Herremans, Dorien

arXiv.org Artificial IntelligenceFeb-1-2023

In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results. The joint training of the transcription and source separation modules serves to improve the performance of both tasks. The instrument module is optional and can be directly controlled by human users. This makes Jointist a flexible user-controllable framework. Our challenging problem formulation makes the model highly useful in the real world given that modern popular music typically consists of multiple instruments. Its novelty, however, necessitates a new perspective on how to evaluate such a model. In our experiments, we assess the proposed model from various aspects, providing a new evaluation perspective for multi-instrument transcription. Our subjective listening study shows that Jointist achieves state-of-the-art performance on popular music, outperforming existing multi-instrument transcription models such as MT3. We conducted experiments on several downstream tasks and found that the proposed method improved transcription by more than 1 percentage points (ppt.), source separation by 5 SDR, downbeat detection by 1.8 ppt., chord recognition by 1.4 ppt., and key estimation by 1.4 ppt., when utilizing transcription results obtained from Jointist. Demo available at \url{https://jointist.github.io/Demo}.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.00286

Country:

Asia (0.28)
North America (0.28)

Genre: Research Report > New Finding (0.87)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback