LearningRepresentationsfromAudio-Visual SpatialAlignment

Neural Information Processing Systems 

While these approaches learn high-quality representations for downstream tasks such as action recognition, their training objectives disregard spatial cues naturally occurring in audio and visual signals.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found