On the Out-Of-Distribution Generalization of Multimodal Large Language Models