MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping

Open in new window