wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Neural Information Processing Systems 

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.