MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
Tang, Liujian, Dong, Shaokang, Huang, Yijia, Xiang, Minqi, Ruan, Hongtao, Wang, Bin, Li, Shuo, Xi, Zhiheng, Cao, Zhihui, Pang, Hailiang, Kong, Heng, Yang, He, Chai, Mingxu, Gao, Zhilin, Liu, Xingyu, Fu, Yingnan, Liu, Jiaming, Huang, Xuanjing, Jiang, Yu-Gang, Gui, Tao, Zhang, Qi, Wang, Kang, Zhang, Yunke, Wang, Yuran
–arXiv.org Artificial Intelligence
This paper presents MagicGUI, a foundational mobile GUI agent designed to address critical challenges in perception, grounding, and reasoning within real-world mobile GUI environments. The framework is underpinned by following six key components: (1) a comprehensive and accurate dataset, constructed via the scalable GUI Data Pipeline, which aggregates the largest and most diverse GUI-centric multimodal data to date from open-source repositories, automated crawling, and targeted manual annotation; (2) enhanced perception and grounding capabilities, facilitating fine-grained multimodal alignment for UI element referencing, grounding, and screen comprehension; (3) a comprehensive and unified action space, encompassing both fundamental UI operations and complex interactive intents to support human-agent interactions; (4) planning-oriented reasoning mechanisms that enable the model to decompose complex user instructions into sequential actions with explicit intermediate meta-paln reasoning; (5) an iterative two-stage training procedure, combining large-scale continue pre-training on 7.8M samples with reinforcement fine-tuning utilizing a spatially enhanced composite reward and dual filtering strategy; and (6) competitive performance on both the proprietary Magic-RICH benchmark and over a dozen public benchmarks, achieving superior performance across GUI perception and agent tasks, while demonstrating robust generalization and real-world deployment potential in practical mobile GUI scenarios, as detailed in Figure 1.
arXiv.org Artificial Intelligence
Sep-12-2025
- Country:
- Asia
- China (0.04)
- Middle East > Jordan (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Workflow (1.00)
- Industry:
- Information Technology (0.46)
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science > Problem Solving (0.88)
- Machine Learning > Neural Networks
- Deep Learning (0.68)
- Natural Language
- Chatbot (0.93)
- Large Language Model (1.00)
- Representation & Reasoning > Agents (0.87)
- Communications > Mobile (1.00)
- Graphics (1.00)
- Human Computer Interaction > Interfaces (1.00)
- Artificial Intelligence
- Information Technology