Masked Autoencoders that Listen

Jan-18-2025, 16:10:04 GMT–Neural Information Processing Systems

This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. The decoder then re-orders and decodes the encoded context padded with mask tokens, in order to reconstruct the input spectrogram. We find it beneficial to incorporate local window attention in the decoder, as audio spectrograms are highly correlated in local time and frequency bands.

audio spectrogram, masked autoencoder, spectrogram

Neural Information Processing Systems

Jan-18-2025, 16:10:04 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)