On-the-fly Modulation for Balanced Multimodal Learning