Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos