Spectrogram Feature Losses for Music Source Separation

Sahai, Abhimanyu, Weber, Romann, McWilliams, Brian

Jan-18-2019–arXiv.org Machine Learning

Abstract--In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a highlevel featureloss term, extracted from the spectrograms using a VGG net, can improve separation quality visa-vis a pure pixel-level loss. We show this improvement in the context of the MMDenseNet, a State-of-the-Art deep learning model for this task, for the extraction of drums and vocal sounds from songs in the musdb18 database, covering a broad range of western music genres. We believe that this finding can be generalized and applied to broader machine learning-based systems in the audio domain. I. INTRODUCTION Music source separation is a problem that has been studied for a few decades now: given an audio track with several instruments mixed together (a regular MP3 file, for example), how can it be separated into its component instruments? The obvious application of this problem is in music production - creating karaoke tracks, highlighting select instruments in an audio playback, etc.

deep learning, loss-tuned model, neural network, (18 more...)

arXiv.org Machine Learning

Jan-18-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (0.71)
  - New Finding (0.47)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found