Goto

Collaborating Authors

 mmdensenet


Improving Real-Time Music Accompaniment Separation with MMDenseNet

Wang, Chun-Hsiang, Wang, Chung-Che, Wang, Jun-You, Jang, Jyh-Shing Roger, Chu, Yen-Hsun

arXiv.org Artificial Intelligence

Music source separation aims to separate polyphonic music into different types of sources. Most existing methods focus on enhancing the quality of separated results by using a larger model structure, rendering them unsuitable for deployment on edge devices. Moreover, these methods may produce low-quality output when the input duration is short, making them impractical for real-time applications. Therefore, the goal of this paper is to enhance a lightweight model, MMDenstNet, to strike a balance between separation quality and latency for real-time applications. Different directions of improvement are explored or proposed in this paper, including complex ideal ratio mask, self-attention, band-merge-split method, and feature look back. Source-to-distortion ratio, real-time factor, and optimal latency are employed to evaluate the performance. To align with our application requirements, the evaluation process in this paper focuses on the separation performance of the accompaniment part. Experimental results demonstrate that our improvement achieves low real-time factor and optimal latency while maintaining acceptable separation quality.


Spectrogram Feature Losses for Music Source Separation

Sahai, Abhimanyu, Weber, Romann, McWilliams, Brian

arXiv.org Machine Learning

Abstract--In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a highlevel featureloss term, extracted from the spectrograms using a VGG net, can improve separation quality visa-vis a pure pixel-level loss. We show this improvement in the context of the MMDenseNet, a State-of-the-Art deep learning model for this task, for the extraction of drums and vocal sounds from songs in the musdb18 database, covering a broad range of western music genres. We believe that this finding can be generalized and applied to broader machine learning-based systems in the audio domain. I. INTRODUCTION Music source separation is a problem that has been studied for a few decades now: given an audio track with several instruments mixed together (a regular MP3 file, for example), how can it be separated into its component instruments? The obvious application of this problem is in music production - creating karaoke tracks, highlighting select instruments in an audio playback, etc.