Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding
Jiang, Zhiyuan, Xie, Shenghao, Li, Wenyi, Zu, Wenqiang, Li, Peihang, Qiu, Jiahao, Pei, Siqi, Ma, Lei, Huang, Tiejun, Wang, Mengdi, Liu, Shilong
–arXiv.org Artificial Intelligence
Grounding is a fundamental capability for building graphical user interface (GUI) agents. Although existing approaches rely on large-scale bounding box supervision, they still face various challenges, such as cross-platform generalization, complex layout analysis, and fine-grained element localization. In this paper, we investigate zoom as a strong yet underexplored prior for GUI grounding, and propose a training-free method, ZoomClick. By characterizing four key properties of zoom (i.e., pre-zoom, depth, shrink size, minimal crop size), we unlock its full capabilities for dynamic spatial focusing and adaptive context switching. Experiments demonstrate that our method significantly boosts the performance of both general vision-language and specialized GUI grounding models, achieving state-of-the-art results on several mainstream benchmarks; for example, UI-Venus-72B attains a 73.1% success rate on ScreenSpot-Pro. Furthermore, we present GUIZoom-Bench, a benchmark for evaluating model adaptability to zoom, aiming to inspire future research on improving zoom for further training and test-time scaling in GUI grounding tasks.
arXiv.org Artificial Intelligence
Dec-8-2025
- Country:
- Asia
- China
- Hong Kong (0.04)
- Shaanxi Province > Xi'an (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- China
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States
- Michigan (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Graphics (1.00)
- Human Computer Interaction > Interfaces (0.86)
- Artificial Intelligence
- Information Technology