Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

Jin, Guizhe, Li, Zhuoren, Leng, Bo, Yu, Ran, Xiong, Lu, Sun, Chen

Nov-25-2025–arXiv.org Artificial Intelligence

Reinforcement Learning (RL) is increasingly used in autonomous driving (AD) and shows clear advantages. However, most RL-based AD methods overlook policy structure design. An RL policy that only outputs short-timescale vehicle control commands results in fluctuating driving behavior due to fluctuations in network outputs, while one that only outputs long-timescale driving goals cannot achieve unified optimality of driving behavior and control. Therefore, we propose a multi-timescale hierarchical reinforcement learning approach. Our approach adopts a hierarchical policy structure, where high- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands, respectively. Therein, motion guidance is explicitly represented by hybrid actions to capture multimodal driving behaviors on structured road and support incremental low-level extend-state updates. Additionally, a hierarchical safety mechanism is designed to ensure multi-timescale safety. Evaluation in simulator-based and HighD dataset-based highway multi-lane scenarios demonstrates that our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.

machine learning, motion guidance, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Nov-25-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report (0.64)

Industry:
- Automobiles & Trucks (0.86)
- Information Technology > Robotics & Automation (0.72)
- Transportation > Ground
  - Road (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found