EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models
Tan, Zheyue, Abdullahi, Mustapha, Shi, Tuo, Yuan, Huining, Xu, Zelai, Yu, Chao, Li, Boxun, Zhao, Bo
–arXiv.org Artificial Intelligence
Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training, and agentic RL extends this paradigm to operate as agents through multi-turn interaction and tool use. Scaling such systems exposes two practical bottlenecks: (1) context length grows rapidly during training, inflating memory usage and latency, and triggering out-of-memory (OOM) failures; and (2) intermediate tensors accumulate with context length, making cross-device data movement a major system bottleneck. We present EARL, a scalable system for efficient agentic RL. EARL designs a parallelism selector that dynamically adapts model and training parallelism across RL stages based on sequence length and system load, and a data dispatcher that performs layout-aware, decentralized exchange of intermediate data batches. Together, these components increase throughput, reduce long-context failures, and enable stable large-scale training of agentic LLMs without relying on hard limits or penalties of context length.
arXiv.org Artificial Intelligence
Oct-8-2025
- Country:
- Asia
- China > Jiangsu Province
- Yancheng (0.04)
- South Korea > Seoul
- Seoul (0.05)
- China > Jiangsu Province
- Europe
- Finland (0.05)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.05)
- North America > United States
- New York > New York County > New York City (0.04)
- Asia
- Genre:
- Research Report (0.50)
- Technology: