Visual cognition in multimodal large language models

Open in new window