AITopics | Chang, Sungkyun

Collaborating Authors

Chang, Sungkyun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement

Bhandari, Keshav, Chang, Sungkyun, Lu, Tongyu, Enus, Fareza R., Bradshaw, Louis B., Herremans, Dorien, Colton, Simon

arXiv.org Artificial IntelligenceFeb-6-2025

Deep learning has enabled remarkable advances in style transfer across various domains, offering new possibilities for creative content generation. However, in the realm of symbolic music, generating controllable and expressive performance-level style transfers for complete musical works remains challenging due to limited datasets, especially for genres such as jazz, and the lack of unified models that can handle multiple music generation tasks. This paper presents ImprovNet, a transformer-based architecture that generates expressive and controllable musical improvisations through a self-supervised corruption-refinement training strategy. ImprovNet unifies multiple capabilities within a single model: it can perform cross-genre and intra-genre improvisations, harmonize melodies with genre-specific styles, and execute short prompt continuation and infilling tasks. The model's iterative generation framework allows users to control the degree of style transfer and structural similarity to the original composition. Objective and subjective evaluations demonstrate ImprovNet's effectiveness in generating musically coherent improvisations while maintaining structural relationships with the original pieces. The model outperforms Anticipatory Music Transformer in short continuation and infilling tasks and successfully achieves recognizable genre conversion, with 79\% of participants correctly identifying jazz-style improvisations. Our code and demo page can be found at https://github.com/keshavbhandari/improvnet.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.04522

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation

Chang, Sungkyun, Benetos, Emmanouil, Kirchhoff, Holger, Dixon, Simon

arXiv.org Artificial IntelligenceJul-5-2024

Multi-instrument music transcription aims to convert polyphonic music recordings into musical scores assigned to each instrument. This task is challenging for modeling as it requires simultaneously identifying multiple instruments and transcribing their pitch and precise timing, and the lack of fully annotated data adds to the training difficulties. This paper introduces YourMT3+, a suite of models for enhanced multi-instrument music transcription based on the recent language token decoding approach of MT3. We strengthen its encoder by adopting a hierarchical attention transformer in the time-frequency domain and integrating a mixture of experts (MoE). To address data limitations, we introduce a new multi-channel decoding method for training with incomplete annotations and propose intra- and cross-stem augmentation for dataset mixing. Our experiments demonstrate direct vocal transcription capabilities, eliminating the need for voice separation pre-processors. Benchmarks across ten public datasets show our models' competitiveness with, or superiority to, existing transcription models. Further testing on pop music recordings highlights the limitations of current models. Fully reproducible code and datasets are available at \url{https://github.com/mimbres/YourMT3}

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.04822

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Offline Clustering Approach to Self-supervised Learning for Class-imbalanced Image Data

Chang, Hye-min, Chang, Sungkyun

arXiv.org Artificial IntelligenceDec-21-2022

Class-imbalanced datasets are known to cause the problem of model being biased towards the majority classes. In this project, we set up two research questions: 1) when is the class-imbalance problem more prevalent in self-supervised pre-training? and 2) can offline clustering of feature representations help pre-training on class-imbalanced data? Our experiments investigate the former question by adjusting the degree of {\it class-imbalance} when training the baseline models, namely SimCLR and SimSiam on CIFAR-10 database. To answer the latter question, we train each expert model on each subset of the feature clusters. We then distill the knowledge of expert models into a single model, so that we will be able to compare the performance of this model to our baselines.

artificial intelligence, machine learning, simsiam, (17 more...)

arXiv.org Artificial Intelligence

2212.11444

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Audio Cover Song Identification using Convolutional Neural Network

Chang, Sungkyun, Lee, Juheon, Choe, Sang Keun, Lee, Kyogu

arXiv.org Artificial IntelligenceOct-26-2020

In this paper, we propose a new approach to cover song identification using a CNN (convolutional neural network). Most previous studies extract the feature vectors that characterize the cover song relation from a pair of songs and used it to compute the (dis)similarity between the two songs. Based on the observation that there is a meaningful pattern between cover songs and that this can be learned, we have reformulated the cover song identification problem in a machine learning framework. To do this, we first build the CNN using as an input a cross-similarity matrix generated from a pair of songs. We then construct the data set composed of cover song pairs and non-cover song pairs, which are used as positive and negative training samples, respectively. The trained CNN outputs the probability of being in the cover song relation given a cross-similarity matrix generated from any two pieces of music and identifies the cover song by ranking on the probability. Experimental results show that the proposed algorithm achieves performance better than or comparable to the state-of-the-art.

arXiv.org Artificial Intelligence

1712.00166

Country:

North America > United States (0.14)
Asia > Taiwan (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Add feedback

Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics

Chang, Sungkyun, Lee, Kyogu

arXiv.org Artificial IntelligenceJan-24-2017

Most of the previous approaches to lyrics-to-audio alignment used a pre-developed automatic speech recognition (ASR) system that innately suffered from several difficulties to adapt the speech model to individual singers. A significant aspect missing in previous works is the self-learnability of repetitive vowel patterns in the singing voice, where the vowel part used is more consistent than the consonant part. Based on this, our system first learns a discriminative subspace of vowel sequences, based on weighted symmetric non-negative matrix factorization (WS-NMF), by taking the self-similarity of a standard acoustic feature as an input. Then, we make use of canonical time warping (CTW), derived from a recent computer vision technique, to find an optimal spatiotemporal transformation between the text and the acoustic sequences. Experiments with Korean and English data sets showed that deploying this method after a pre-developed, unsupervised, singing source separation achieved more promising results than other state-of-the-art unsupervised approaches and an existing ASR-based system.

alignment, speech recognition, survey article, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2017.2738558

1701.06078

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.86)

Add feedback