Learning Representations from Audio-Visual Spatial Alignment
–Neural Information Processing Systems
We introduce a novel self-supervised pretext task for learning representations from audio-visual content.
Neural Information Processing Systems
Dec-23-2025, 22:17:47 GMT
- Technology: