Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

Open in new window