Mind the Gap: A Generalized Approach for Cross-Modal Embedding Alignment