BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment