ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

Lai, Hanyu, Liu, Xiao, Zhao, Yanxiao, Xu, Han, Zhang, Hanchen, Jing, Bohao, Ren, Yanyu, Yao, Shuntian, Dong, Yuxiao, Tang, Jie

Oct-22-2025–arXiv.org Artificial Intelligence

We introduce ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to address the inherent mismatch between machine agents and human-centric desktop environments. Scaling end-to-end RL training is crucial for improvement and generalization across diverse desktop tasks; however, it remains challenging due to environmental inefficiency and instability during extended training. To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale online RL. Furthermore, we propose Entropulse, a training strategy that alternates reinforcement learning with supervised fine-tuning, effectively mitigating entropy collapse during extended training runs. We employ ComputerRL on open models GLM-4-9B-0414 and GLM-4.1V-9B-Thinking, and evaluate them on the OSWorld benchmark. The AutoGLM-OS-9B achieves a new state-of-the-art accuracy of 48.9%, demonstrating significant improvements for general agents in desktop automation. Our code and the new OfficeWorld benchmark are available at https://github.com/thudm/ComputerRL. The algorithm and framework are adopted in building AutoGLM (Liu et al., 2024b).

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Oct-22-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.67)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (0.93)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.70)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Reinforcement Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found