Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

Open in new window