DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Hao Bai 1,2 Yifei Zhou 1 Jiayi Pan

May-28-2025, 14:11:59 GMT–Neural Information Processing Systems

While training with static demonstrations has shown some promise, we show that such methods fall short for controlling real GUIs due to their failure to deal with real world stochasticity and non-stationarity not captured in static observational data. This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents through fine-tuning a pre-trained VLM in two stages: offline RL to initialize the model, followed by offline-to-online RL. To do this, we build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator and develop a simple yet effective RL approach for learning in this domain. Our approach runs advantage-weighted RL with advantage estimators enhanced to account for stochasticity along with an automatic curriculum for deriving maximal learning signal. We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild (AitW) dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement - from 17.7 to 67.2% success rate - over supervised fine-tuning with static human demonstration data. These results significantly surpass not only the prior best agents, including AppAgent with GPT-4V (8.3% success rate) and the 17B CogAgent trained with AitW data (38.5%),

large language model, machine learning, reinforcement learning, (23 more...)

Neural Information Processing Systems

May-28-2025, 14:11:59 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.68)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Education > Educational Setting
  - Online (0.46)
- Energy > Oil & Gas (0.46)
- Information Technology > Services (0.68)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks > Deep Learning (1.00)
      - Reinforcement Learning (1.00)
    - Natural Language > Large Language Model (1.00)
    - Representation & Reasoning > Agents (0.67)
  - Communications > Mobile (0.88)
  - Information Management > Search (0.94)