VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Open in new window