EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models

Tan, Zheyue, Abdullahi, Mustapha, Shi, Tuo, Yuan, Huining, Xu, Zelai, Yu, Chao, Li, Boxun, Zhao, Bo

Oct-8-2025–arXiv.org Artificial Intelligence

Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training, and agentic RL extends this paradigm to operate as agents through multi-turn interaction and tool use. Scaling such systems exposes two practical bottlenecks: (1) context length grows rapidly during training, inflating memory usage and latency, and triggering out-of-memory (OOM) failures; and (2) intermediate tensors accumulate with context length, making cross-device data movement a major system bottleneck. We present EARL, a scalable system for efficient agentic RL. EARL designs a parallelism selector that dynamically adapts model and training parallelism across RL stages based on sequence length and system load, and a data dispatcher that performs layout-aware, decentralized exchange of intermediate data batches. Together, these components increase throughput, reduce long-context failures, and enable stable large-scale training of agentic LLMs without relying on hard limits or penalties of context length.

context length, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Oct-8-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.15)
- Asia (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)