Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos

Open in new window