Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization