AITopics

Technology:

Information Technology > Artificial Intelligence (0.95)
Information Technology > Human Computer Interaction > Interfaces (0.58)

Neural Information Processing SystemsFeb-8-2026, 12:17:59 GMT

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Hao Bai 1,2 Yifei Zhou

While training with static demonstrations has shown some promise, we show that such methods fall short for controlling real GUIs due to their failure to deal with real world stochasticity and non-stationarity not captured in static observational data.

large language model, machine learning, reinforcement learning, (22 more...)

Country:

South America > Chile (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Services (0.68)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 19:27:25 GMT

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Hao Bai 1,2 Yifei Zhou

agent, digirl, trajectory, (17 more...)

Country:

South America > Chile (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Services (0.68)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Neural Information Processing SystemsMay-26-2025, 17:13:37 GMT

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

artificial intelligence, human computer interaction, machine learning, (8 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

arXiv.org Artificial IntelligenceFeb-26-2025

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Zheng, Jiani, Wang, Lu, Yang, Fangkai, Zhang, Chaoyun, Mei, Lingrui, Yin, Wenjie, Lin, Qingwei, Zhang, Dongmei, Rajmohan, Saravan, Zhang, Qi

Training Vision-Language Models (VLMs) for Graphical User Interfaces (GUI) agents via Reinforcement Learning (RL) faces critical challenges: environment-based RL requires costly interactions, while environment-free methods struggle with distribution shift and reward generalization. We propose an environment-free RL framework that decouples value estimation from policy optimization by leveraging a pretrained Value Environment Model (VEM). VEM predicts state-action values directly from offline data, distilling human-like priors about GUI interaction outcomes without requiring next-state prediction or environmental feedback. This avoids compounding errors and enhances resilience to UI changes by focusing on semantic reasoning (e.g., Does this action advance the user's goal?). The framework operates in two stages: (1) pretraining VEM to estimate long-term action utilities and (2) guiding policy exploration with frozen VEM signals, enabling layout-agnostic GUI automation. Evaluated on Android-in-the-Wild benchmarks, VEM achieves state-of-the-art performance in both offline and online settings, outperforming environment-free baselines significantly and matching environment-based approaches without interaction costs. Importantly, VEM demonstrates that semantic-aware value estimation can achieve comparable performance with online-trained methods.

agent, arxiv preprint arxiv, interaction, (11 more...)

2502.18906

Country:

Europe > United Kingdom > England (0.05)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report (0.64)
Workflow (0.47)

Industry: Information Technology (0.46)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceDec-3-2024

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Qi, Zehan, Liu, Xiao, Iong, Iat Long, Lai, Hanyu, Sun, Xueqiao, Zhao, Wenyi, Yang, Yu, Yang, Xinyue, Sun, Jiadai, Yao, Shuntian, Zhang, Tianjie, Xu, Wei, Tang, Jie, Dong, Yuxiao

Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents heavily rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. WebRL addresses three key challenges in building LLM web agents, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. Specifically, WebRL incorporates 1) a self-evolving curriculum that generates new tasks from unsuccessful attempts, 2) a robust outcome-supervised reward model (ORM), and 3) adaptive reinforcement learning strategies to ensure consistent improvements. We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents. On WebArena-Lite, WebRL improves the success rate of Llama-3.1-8B from 4.8% to 42.4%, and from 6.1% to 43% for GLM-4-9B. These open models significantly surpass the performance of GPT-4-Turbo (17.6%) and GPT-4o (13.9%) and outperform previous state-of-the-art web agents trained on open LLMs (AutoWebGLM, 18.2%). Our findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, paving the way for more accessible and powerful autonomous web interaction systems.

agent, eb rl, instruction, (14 more...)

2411.02337

Genre: Research Report > New Finding (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-30-2024

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

Wang, Taiyi, Wu, Zhihao, Liu, Jianheng, Hao, Jianye, Wang, Jun, Shao, Kun

On-device control agents, especially on mobile devices, are responsible for operating mobile devices to fulfill users' requests, enabling seamless and intuitive interactions. Integrating Multimodal Large Language Models (MLLMs) into these agents enhances their ability to understand and execute complex commands, thereby improving user experience. However, fine-tuning MLLMs for on-device control presents significant challenges due to limited data availability and inefficient online training processes. This paper introduces DistRL, a novel framework designed to enhance the efficiency of online RL fine-tuning for mobile device control agents. DistRL employs centralized training and decentralized data acquisition to ensure efficient fine-tuning in the context of dynamic online interactions. Additionally, the framework is backed by our tailor-made RL algorithm, which effectively balances exploration with the prioritized utilization of collected data to ensure stable and robust training. Our experiments show that, on average, DistRL delivers a 3X improvement in training efficiency and enables training data collection 2.4X faster than the leading synchronous multi-machine methods. Notably, after training, DistRL achieves a 20% relative improvement in success rate compared to state-of-the-art methods on general Android tasks from an open benchmark, significantly outperforming existing approaches while maintaining the same training time. These results validate DistRL as a scalable and efficient solution, offering substantial improvements in both training efficiency and agent performance for real-world, in-the-wild device control tasks.

large language model, machine learning, reinforcement learning, (17 more...)

2410.14803

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Norway (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Education > Educational Setting > Online (1.00)
Information Technology > Services (0.93)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Artificial IntelligenceJun-14-2024

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Bai, Hao, Zhou, Yifei, Cemri, Mert, Pan, Jiayi, Suhr, Alane, Levine, Sergey, Kumar, Aviral

Training corpuses for vision language models (VLMs) typically lack sufficient amounts of decision-centric data. This renders off-the-shelf VLMs sub-optimal for decision-making tasks such as in-the-wild device control through graphical user interfaces (GUIs). While training with static demonstrations has shown some promise, we show that such methods fall short for controlling real GUIs due to their failure to deal with real-world stochasticity and non-stationarity not captured in static observational data. This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents through fine-tuning a pre-trained VLM in two stages: offline RL to initialize the model, followed by offline-to-online RL. To do this, we build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator and develop a simple yet effective RL approach for learning in this domain. Our approach runs advantage-weighted RL with advantage estimators enhanced to account for stochasticity along with an automatic curriculum for deriving maximal learning signal. We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild (AitW) dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement -- from 17.7 to 67.2% success rate -- over supervised fine-tuning with static human demonstration data. These results significantly surpass not only the prior best agents, including AppAgent with GPT-4V (8.3% success rate) and the 17B CogAgent trained with AitW data (38.5%), but also the prior best autonomous RL approach based on filtered behavior cloning (57.8%), thereby establishing a new state-of-the-art for digital agents for in-the-wild device control.

large language model, machine learning, reinforcement learning, (24 more...)

2406.11896

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry:

Information Technology > Services (0.47)
Energy > Oil & Gas (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)