Is there a way to input specto-temporal data into a self organized map in Python/Tensoflow? [Project] • r/MachineLearning
I'm not read up on speech processing (I focus on music) but that sounds really cool and useful (e.g. There seems to be a bunch of stuff on the topic. Anyway, if you have a dataset of speech audio files and corresponding labels with at what times stutters occur, then this would be a straight forward problem for a HMM or RNN and you might not benefit from a separate dimensionality reduction step. As the classification problem is binary (I'm assuming a spectral frame is fluent or not) it might be enough to do a standard STFT, turn the labels into a binary vector of the same time resolution, and train a baseline LSTM in Keras. I'd see how far that gets before looking into trickier stuff like filterbanks and CTC.
Dec-1-2017, 17:15:49 GMT
- Technology: