Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
Pei, Muleilan, Shi, Shaoshuai, Shen, Shaojie
–arXiv.org Artificial Intelligence
Scalable and realistic simulation of multi-agent traffic behavior is critical for advancing autonomous driving technologies. Although existing data-driven simulators have made significant strides in this domain, they predominantly rely on supervised learning to align simulated distributions with real-world driving scenarios. A persistent challenge, however, lies in the distributional shift that arises between training and testing, which often undermines model generalization in unseen environments. To address this limitation, we propose SMART -R1, a novel R1-style reinforcement fine-tuning paradigm tailored for next-token prediction models to better align agent behavior with human preferences and evaluation metrics. Our approach introduces a metric-oriented policy optimization algorithm to improve distribution alignment and an iterative "SFT -RFT -SFT" training strategy that alternates between Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT) to maximize performance gains. The results on the Waymo Open Sim Agents Challenge (WOSAC) showcase that SMART -R1 achieves state-of-the-art performance with an overall realism meta score of 0.7858, ranking first on the leaderboard at the time of submission. Simulating multi-agent traffic behaviors plays a pivotal role in ensuring the safety and reliability of autonomous driving systems. However, modeling realistic and scalable traffic behaviors remains highly challenging due to the inherent uncertainty and multi-modality of human driving. Traditional simulators that simply replay logged data lack reactive capability, while rule-based methods, such as the Intelligent Driver Model (IDM) (Treiber et al., 2000), depend on handcrafted heuristics and fail to capture the diversity and realism of human behavior.
arXiv.org Artificial Intelligence
Sep-30-2025