Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning

Open in new window