Review for NeurIPS paper: Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Neural Information Processing Systems 

Additional Feedback: The paper presents a framework for localizing sounding objects in an audiovisual scene. Overall, I liked the paper. The proposed approach is neat and makes sense to the most extent. I have a few points of concern and I would like to see the author's responses on them. I would be happy to raise my overall score if the responses are satisfactory.