Deep Learning for Audio Signal Processing

Purwins, Hendrik, Li, Bo, Virtanen, Tuomas, Schlüter, Jan, Chang, Shuo-yiin, Sainath, Tara

arXiv.org Machine Learning 

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Abstract--Given the recent surge in developments of deep x learning, this article provides a review of the state-of-the-art input sequence deep learning techniques for audio signal processing. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic The number of labels to be predicted (left), and the type of each label (right). While many deep learning methods have been adopted from I. INTRODUCTION Audio [2] in 1986, and finally 3) the success of deep learning in signals are commonly transformed into two-dimensional timefrequency speech recognition [3] and image classification [4] in 2012, representations for processing, but the two axes, leading to a renaissance of deep learning, involving e.g. Images are instantaneous snapshots networks (CNNs, [6]) and long short-term memory (LSTM, of a target and often analyzed as a whole or in patches [7]). In this "deep" paradigm, architectures with a large number with little order constraints; however audio signals have to be of parameters are trained to learn from a massive amount of studied sequentially in chronological order. METHODS many areas of signal processing, often outperforming traditional To set the stage, we give a conceptual overview of audio signal processing on a large scale. In this most recent analysis and synthesis problems (II-A), the input representations wave, deep learning first gained traction in image processing commonly used to address them (II-B), and the models [4], but was then widely adopted in speech processing, music shared between different application fields (II-C). H. Purwins is with Department of Architecture, Design & Media Technology, This division encompasses two independent axes (cf. Manuscript received October 11, 2018 While the audio signal will often be processed into a sequence of features, This is a PREPRINT we consider this part of the solution, not of the task. JOURNAL OF SELECTED TOPICS OF SIGNAL PROCESSING, VOL.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found