Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization

Open in new window