A Gentle Introduction to Audio Classification With Tensorflow

May-6-2021, 09:00:04 GMT–#artificialintelligence

We have seen a lot of recent advances in deep learning related to vision and language fields, it is intuitive to understand why CNN performs very well on images, with pixel's local correlation, and how sequential models like RNNs or transformers also perform very well on language, with its sequential nature, but what about audio? In this article you will learn how to approach a simple audio classification problem, you will learn some of the common and efficient methods used, and the Tensorflow code to do it. Disclaimer: The code presented here is based on my work developed for the "Rainforest Connection Species Audio Detection" Kaggle competition, but for demonstration purposes, I will use the "Speech Commands" dataset. We usually have audio files in the ".wav" format, they are commonly referred to as waveforms, a waveform is a time series with the signal amplitude at each specific time, if we visualize one of those waveform samples we will get something like this: Intuitively one might consider modeling this data like a regular time series (e.g. stock price forecasting) using some kind of RNN model, in fact, this could be done, but since we are using audio signals, a more appropriate choice is to transform the waveform samples into spectrograms. A spectrogram is an image representation of the waveform signal, it shows its frequency intensity range over time, it can be very useful when we want to evaluate the signal's frequency distribution over time.

audio classification, gentle introduction, spectrogram, (15 more...)

#artificialintelligence

May-6-2021, 09:00:04 GMT

News Web Page

Add feedback

Genre:
- Instructional Material > Course Syllabus & Notes (0.77)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found