LearningRepresentationsfromAudio-Visual SpatialAlignment
–Neural Information Processing Systems
While these approaches learn high-quality representations for downstream tasks such as action recognition, their training objectives disregard spatial cues naturally occurring in audio and visual signals.
Neural Information Processing Systems
Feb-8-2026, 00:42:51 GMT
- Country:
- Technology: