Using recurrences in time and frequency within U-net architecture for speech enhancement
Grzywalski, Tomasz, Drgas, Szymon
ABSTRACT When designing fully-convolutional neural network, there is a tradeoff between receptive field size, number of parameters and spatial resolution of features in deeper layers of the network. Inthis work we present a novel network design based on combination of many convolutional and recurrent layers that solves these dilemmas. We compare our solution with U-nets based models known from the literature and other baseline modelson speech enhancement task. We test our solution onTIMIT speech utterances combined with noise segments extractedfrom NOISEX-92 database and show clear advantage of proposed solution in terms of SDR (signal-todistortion ratio),SIR (signal-to-interference ratio) and STOI (spectro-temporal objective intelligibility) metrics compared to the current state-of-the-art. Index Terms-- deep learning, speech enhancement, U-nets 1.INTRODUCTION The single-channel speech enhancement problem is to reduce a noise present in a single-channel recording of speech.
Nov-16-2018