Robust image classification with multi-modal large language models