Hierarchical Vision-Language Reasoning for Multimodal Multiple-Choice Question Answering

Open in new window