Masked Autoencoders that Listen

Neural Information Processing Systems 

This paper studies a simple extension of image-based Masked Autoencoders (MAE) [1] to self-supervised representation learning from audio spectrograms.