Promoting cross-modal representations to improve multimodal foundation models for physiological signals