Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos