MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

Cheng, Weihua, Ni, Ersheng, Wang, Wenlong, Sun, Yifei, Liu, Junming, Shen, Wangyu, Chen, Yirong, Shi, Botian, Wang, Ding

Oct-29-2025–arXiv.org Artificial Intelligence

The rapid progress of Large Language Models (LLMs) and their multimodal extensions (MLLMs) has enabled agentic systems capable of perceiving and acting across diverse environments. A challenging yet impactful frontier is the development of GUI agents, which must navigate complex desktop and web interfaces while maintaining robustness and generalization. Existing paradigms typically model tasks as long-chain executions, concatenating historical trajectories into the context. While approaches such as Mirage and GTA1 refine planning or introduce multi-branch action selection, they remain constrained by two persistent issues: Dependence on historical trajectories, which amplifies error propagation. And Local exploration bias, where "decision-first, observation-later" mechanisms overlook critical interface cues. We introduce the Memory-Driven GUI Agent (MGA), which reframes GUI interaction around the principle of observe first, then decide. MGA models each step as an independent, context-rich environment state represented by a triad: current screenshot, task-agnostic spatial information, and a dynamically updated structured memory. Experiments on OSworld benchmarks, real desktop applications (Chrome, VSCode, VLC), and cross-task transfer demonstrate that MGA achieves substantial gains in robustness, generalization, and efficiency compared to state-of-the-art baselines. The code is publicly available at: {https://anonymous.4open.science/r/MGA-3571}.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-29-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.16)

Genre:
- Research Report (0.50)
- Workflow (0.47)

Technology:
- Information Technology
  - Graphics (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Agents (1.00)
    - Natural Language > Large Language Model (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found