OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents

Cheng, Pengzhou, Wu, Zheng, Wu, Zongru, Zhang, Aston, Zhang, Zhuosheng, Liu, Gongshen

Feb-26-2025–arXiv.org Artificial Intelligence

Autonomous graphical user interface (GUI) agents powered by multimodal large language models have shown great promise. However, a critical yet underexplored issue persists: over-execution, where the agent executes tasks in a fully autonomous way, without adequate assessment of its action confidence to compromise an adaptive human-agent collaboration. This poses substantial risks in complex scenarios, such as those involving ambiguous user instructions, unexpected interruptions, and environmental hijacks. To address the issue, we introduce OS-Kairos, an adaptive GUI agent capable of predicting confidence levels at each interaction step and efficiently deciding whether to act autonomously or seek human intervention. OS-Kairos is developed through two key mechanisms: (i) collaborative probing that annotates confidence scores at each interaction step; (ii) confidence-driven interaction that leverages these confidence scores to elicit the ability of adaptive interaction. Experimental results show that OS-Kairos substantially outperforms existing models on our curated dataset featuring complex scenarios, as well as on established benchmarks such as AITZ and Meta-GUI, with 24.59\%$\sim$87.29\% improvements in task success rate. OS-Kairos facilitates an adaptive human-agent collaboration, prioritizing effectiveness, generality, scalability, and efficiency for real-world GUI interaction. The dataset and codes are available at https://github.com/Wuzheng02/OS-Kairos.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Feb-26-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Thailand (0.14)
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.48)
- Workflow (1.00)

Industry:
- Information Technology (0.93)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Performance Analysis
      - Accuracy (0.46)
    - Natural Language > Large Language Model (0.91)
    - Representation & Reasoning > Agents (0.86)
  - Communications > Mobile (0.94)
  - Graphics (1.00)