TransDiffuser: Diverse Trajectory Generation with Decorrelated Multi-modal Representation for End-to-end Autonomous Driving
Jiang, Xuefeng, Ma, Yuan, Li, Pengxiang, Xu, Leimeng, Wen, Xin, Zhan, Kun, Xia, Zhongpu, Jia, Peng, Lang, Xianpeng, Sun, Sheng
–arXiv.org Artificial Intelligence
In recent years, diffusion models have demonstrated remarkable potential across diverse domains, from vision generation to language modeling. Transferring its generative capabilities to modern end-to-end autonomous driving systems has also emerged as a promising direction. However, existing diffusion-based trajectory generative models often exhibit mode collapse where different random noises converge to similar trajectories after the denoising process.Therefore, state-of-the-art models often rely on anchored trajectories from pre-defined trajectory vocabulary or scene priors in the training set to mitigate collapse and enrich the diversity of generated trajectories, but such inductive bias are not available in real-world deployment, which can be challenged when generalizing to unseen scenarios. In this work, we investigate the possibility of effectively tackling the mode collapse challenge without the assumption of pre-defined trajectory vocabulary or pre-computed scene priors. Specifically, we propose TransDiffuser, an encoder-decoder based generative trajectory planning model, where the encoded scene information and motion states serve as the multi-modal conditional input of the denoising decoder. Different from existing approaches, we exploit a simple yet effective multi-modal representation decorrelation optimization mechanism during the denoising process to enrich the latent representation space which better guides the downstream generation. Without any predefined trajectory anchors or pre-computed scene priors, TransDiffuser achieves the PDMS of 94.85 on the closed-loop planning-oriented benchmark NAVSIM, surpassing previous state-of-the-art methods. Qualitative evaluation further showcases TransDiffuser generates more diverse and plausible trajectories which explore more drivable area.
arXiv.org Artificial Intelligence
Sep-17-2025
- Country:
- Asia > China
- Anhui Province > Hefei (0.04)
- Shanghai > Shanghai (0.04)
- Europe > Germany
- Lower Saxony > Hanover (0.04)
- South America > Suriname
- North Atlantic Ocean (0.04)
- Asia > China
- Genre:
- Research Report > Promising Solution (0.54)
- Industry:
- Automobiles & Trucks (0.87)
- Information Technology > Robotics & Automation (0.63)
- Transportation > Ground
- Road (0.73)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.93)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Robots > Autonomous Vehicles (0.87)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence