Towards Robust Multi-Modal Reasoning via Model Selection