Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Xie, Yuquan, Li, Zaijing, Shao, Rui, Chen, Gongwei, Zhou, Kaiwen, Li, Yinchuan, Jiang, Dongmei, Nie, Liqiang
–arXiv.org Artificial Intelligence
Recent efforts to leverage the Multi-modal Large Language Model (MLLM) as GUI agents have yielded promising outcomes. However, these agents still struggle with long-horizon tasks in online environments, primarily due to insufficient knowledge and the inherent gap between offline and online domains. In this paper, inspired by how humans generalize knowledge in open-ended environments, we propose a Hierarchical Multimodal Skills (HMS) module to tackle the issue of insufficient knowledge. It progressively abstracts trajectories into execution skills, core skills, and ultimately meta-skills, providing a hierarchical knowledge structure for long-horizon task planning. To bridge the domain gap, we propose the Skill-Augmented Monte Carlo Tree Search (SA-MCTS) algorithm, which efficiently leverages skills acquired in offline environments to reduce the action search space during online tree exploration. Building on HMS, we propose Mirage-1, a multimodal, cross-platform, plug-and-play GUI agent. To validate the performance of Mirage-1 in real-world long-horizon scenarios, we constructed a new benchmark, AndroidLH. Experimental results show that Mirage-1 outperforms previous agents by 32\%, 19\%, 15\%, and 79\% on AndroidWorld, MobileMiniWob++, Mind2Web-Live, and AndroidLH, respectively. Project page: https://cybertronagent.github.io/Mirage-1.github.io/
arXiv.org Artificial Intelligence
Jun-13-2025
- Country:
- Asia
- China
- Guangdong Province > Shenzhen (0.04)
- Heilongjiang Province > Harbin (0.04)
- Ningxia Hui Autonomous Region > Yinchuan (0.04)
- Indonesia > Bali (0.04)
- Japan > Honshū
- Chūbu > Toyama Prefecture > Toyama (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- China
- North America
- Canada > Ontario
- Middlesex County > London (0.04)
- United States > New York (0.04)
- Canada > Ontario
- Asia
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Information Technology (0.46)
- Technology: