Review for NeurIPS paper: Learning Representations from Audio-Visual Spatial Alignment

Jan-23-2025, 06:04:03 GMT–Neural Information Processing Systems

Saying the models completely disregard spatial information is too strong a statement as these models can easily be repurposed to localize sound sources to some extent. I believe there is some miscommunication. I meant using the model for a downstream task that requires audio visual spatial alignment. The authors report results of the AVSA self-supervision task and compare it to other methods like AVC. But that is the self-supervision task or pre-text task setup rather than an actual downstream task.

audio-visual spatial alignment, learning representation, neurips paper, (1 more...)

Neural Information Processing Systems

Jan-23-2025, 06:04:03 GMT

Conferences Web Page

Add feedback

Genre:
- Summary/Review (0.40)

Technology:
- Information Technology > Artificial Intelligence (0.40)