But while people appear to learn in a similar way regardless of how they get information -- whether they use sight or sound, for example -- there are currently big differences in the way self-supervised learning algorithms learn from images, speech, text, and other modalities. This discrepancy has been a significant barrier to applying advances in self-supervised learning more broadly. Because a powerful algorithm designed for, say, understanding images can't be directly applied to another modality, such as text, it is difficult to push several modalities ahead at the same rate. This is why Meta AI developed and is excited to announce data2vec, the first high-performance self-supervised algorithm that works for multiple modalities. We apply data2vec separately to speech, images and text and it outperformed the previous best single-purpose algorithms for computer vision and speech and it is competitive on NLP tasks.