Train Your Own Variational Auto-Encoder for Sound Generation with AWS SageMaker
Entering the 22nd of 150 epochs after 10 hours of training, I realized the 3000 wav file dataset was a bit tough to swallow for my 5 year old MacBook Pro. The Free Spoken Digit Dataset contains recordings from 6 speakers and 50 of each digit per speaker in 8kHz .wav As I was following along the outstanding video series on Sound Generation With Neural Networks by Valerio Velardo, I found myself stuck in an endless training phase. The goal is to train a custom-made Variational Auto-Encoder to generate sound digits. The preprocessing of the FSDD wav files was performed locally and generated a training dataset of 3000 spectrograms in .npy
Jul-23-2021, 15:55:22 GMT