VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models

Open in new window