Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles
Wen, Ruoqi, Li, Rongpeng, Xu, Xing, Zhao, Zhifeng
–arXiv.org Artificial Intelligence
Abstract--Deep Reinforcement Learning (DRL) holds significant promise for achieving human-like Autonomous Vehicle (AV) capabilities, but suffers from low sample efficiency and challenges in reward design. Model-Based Reinforcement Learning (MBRL) offers improved sample efficiency and generalizability compared to Model-Free Reinforcement Learning (MFRL) in various multi-agent decision-making scenarios. Nevertheless, MBRL faces critical difficulties in estimating uncertainty during the model learning phase, thereby limiting its scalability and applicability in real-world scenarios. Additionally, most Connected Autonomous Vehicle (CAV) studies focus on single-agent decision-making, while existing multi-agent MBRL solutions lack computationally tractable algorithms with Probably Approximately Correct (P AC) guarantees, an essential factor for ensuring policy reliability with limited training data. T o address these challenges, we propose MA-PMBRL, a novel Multi-Agent Pessimistic Model-Based Reinforcement Learning framework for CAVs, incorporating a max-min optimization approach to enhance robustness and decision-making. T o mitigate the inherent subjectivity of uncertainty estimation in MBRL and avoid incurring catastrophic failures in AV, MA-PMBRL employs a pessimistic optimization framework combined with Projected Gradient Descent (PGD) for both model and policy learning. MA-PMBRL also employs general function approximations under partial dataset coverage to enhance learning efficiency and system-level performance. By bounding the suboptimality of the resulting policy under mild theoretical assumptions, we successfully establish P AC guarantees for MA-PMBRL, demonstrating that the proposed framework represents a significant step toward scalable, efficient, and reliable multi-agent decision-making for CAVs. Multi-Agent Reinforcement Learning (MARL) has emerged as a promising approach for enabling CA Vs to execute complex tasks autonomously . R. Wen and R. Li are with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310058, China (email: {wenruoqi, lirongpeng }@zju.edu.cn). X. Xu is with the Information and Communication Branch of State Grid Hebei Electric Power Co., Ltd, China (e-mail:hsuxing@zju.edu.cn). Z. Zhao is with Zhejiang Lab, Hangzhou 311121, China, and also with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310058, China (email: zhaozf@zhejianglab.com). However, the costly requirement for sufficient data through extensive real-world interactions makes MFRL stuck in unstable learning and high computational overhead, thus making it less competent in autonomous driving scenarios.
arXiv.org Artificial Intelligence
Mar-26-2025
- Country:
- Asia > China
- Zhejiang Province > Hangzhou (0.64)
- Europe
- North America
- Canada (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.14)
- Georgia > Fulton County
- Atlanta (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > Los Angeles County
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia > China
- Genre:
- Research Report (1.00)
- Industry:
- Energy > Power Industry (0.54)
- Transportation > Ground
- Road (0.66)
- Technology: