MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale