A Gentle Introduction to Audio Classification With Tensorflow
We have seen a lot of recent advances in deep learning related to vision and language fields, it is intuitive to understand why CNN performs very well on images, with pixel's local correlation, and how sequential models like RNNs or transformers also perform very well on language, with its sequential nature, but what about audio? In this article you will learn how to approach a simple audio classification problem, you will learn some of the common and efficient methods used, and the Tensorflow code to do it. Disclaimer: The code presented here is based on my work developed for the "Rainforest Connection Species Audio Detection" Kaggle competition, but for demonstration purposes, I will use the "Speech Commands" dataset. We usually have audio files in the ".wav" format, they are commonly referred to as waveforms, a waveform is a time series with the signal amplitude at each specific time, if we visualize one of those waveform samples we will get something like this: Intuitively one might consider modeling this data like a regular time series (e.g. stock price forecasting) using some kind of RNN model, in fact, this could be done, but since we are using audio signals, a more appropriate choice is to transform the waveform samples into spectrograms. A spectrogram is an image representation of the waveform signal, it shows its frequency intensity range over time, it can be very useful when we want to evaluate the signal's frequency distribution over time.
May-6-2021, 09:00:04 GMT
- Genre:
- Technology: