Balance-aware Sequence Sampling Makes Multi-modal Learning Better