MIMIC-IT: Multi-Modal In-Context Instruction Tuning