HuBERT Explained
The HuBERT model architecture follows the wav2vec 2.0 architecture consisting of: The number of each of these components varies between the base, large and x-large variations. Each component and its task will be better explained while explaining the training loop. The first training step consists of discovering the hidden units, and the process begins with extracting MFCCs(Mel frequency cepstrum) from the audio waveform. These are raw acoustic features useful for representing speech. Each segment of audio is then passed to the K-means clustering algorithm, and assigned to one of K clusters.
Dec-21-2021, 12:05:59 GMT
- Technology: