From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models

Open in new window