VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

Open in new window