On the Computational Benefit of Multimodal Learning