Goto

Collaborating Authors

 Buhet, Thibault


V-Max: Making RL practical for Autonomous Driving

arXiv.org Artificial Intelligence

Learning-based decision-making has the potential to enable generalizable Autonomous Driving (AD) policies, reducing the engineering overhead of rule-based approaches. Imitation Learning (IL) remains the dominant paradigm, benefiting from large-scale human demonstration datasets, but it suffers from inherent limitations such as distribution shift and imitation gaps. Reinforcement Learning (RL) presents a promising alternative, yet its adoption in AD remains limited due to the lack of standardized and efficient research frameworks. To this end, we introduce V-Max, an open research framework providing all the necessary tools to make RL practical for AD. V-Max is built on Waymax, a hardware-accelerated AD simulator designed for large-scale experimentation. We extend it using ScenarioNet's approach, enabling the fast simulation of diverse AD datasets. V-Max integrates a set of observation and reward functions, transformer-based encoders, and training pipelines. Additionally, it includes adversarial evaluation settings and an extensive set of evaluation metrics. Through a large-scale benchmark, we analyze how network architectures, observation functions, training data, and reward shaping impact RL performance.


Conditional Vehicle Trajectories Prediction in CARLA Urban Environment

arXiv.org Artificial Intelligence

Imitation learning is becoming more and more successful for autonomous driving. End-to-end (raw signal to command) performs well on relatively simple tasks (lane keeping and navigation). Mid-to-mid (environment abstraction to mid-level trajectory representation) or direct perception (raw signal to performance) approaches strive to handle more complex, real life environment and tasks (e.g. In this work, we show that complex urban situations can be handled with raw signal input and mid-level representation. W e build a hybrid end-to-mid approach predicting trajectories for neighbor vehicles and for the ego vehicle with a conditional navigation goal. W e propose an original architecture inspired from social pooling LSTM taking low and mid level data as input and producing trajectories as polynomials of time. W e introduce a label augmentation mechanism to get the level of generalization that is required to control a vehicle. The performance is evaluated on CARLA 0.8 benchmark, showing significant improvements over previously published state of the art. 1. Introduction Modular pipelines [32] are the most used approach to autonomous driving. The advantage is that the modules are interpretable and relatively mature, in particular on the perception side with the success of deep learning for object detection ([13, 20] among many others). However, the complexity of the interactions in the real world causes the pipeline to be also complex, especially in the planning and decision modules.