Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

Li, Zenan, Nie, Fan, Sun, Qiao, Da, Fang, Zhao, Hang

arXiv.org Artificial Intelligence 

Offline Reinforcement Learning (RL) has emerged as a promising framework for learning policies without active interactions, making it especially appealing for autonomous driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which performs well in long-horizon tasks. However, they are overly optimistic in stochastic environments with incorrect assumptions that the same goal can be consistently achieved by identical actions. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates state uncertainties by the conditional mutual information between transitions and returns, and segments sequences accordingly. Discovering the'uncertainty accumulation' and'temporal locality' properties of driving environments, UNREST replaces the global returns in decision transformers with less uncertain truncated returns, to learn from true outcomes of agent actions rather than environment transitions. We also dynamically evaluate environmental uncertainty during inference for cautious planning. Extensive experimental results demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy. Safe and efficient motion planning has been recognized as a crucial component and the bottleneck in autonomous driving systems (Yurtsever et al., 2020). Nowadays, learning-based planning algorithms like imitation learning (IL) (Bansal et al., 2018; Zeng et al., 2019) and reinforcement learning (RL) (Chen et al., 2019a; 2020) have gained prominence with the advent of intelligent simulators (Dosovitskiy et al., 2017; Sun et al., 2022b) and large-scale datasets (Caesar et al., 2021). Building on these, offline RL (Diehl et al., 2021; Li et al., 2022a) becomes a promising framework for safety-critical driving tasks to learn policies from offline data while retaining the ability to leverage and improve over data of various quality (Fujimoto et al., 2019; Kumar et al., 2020). Nevertheless, the application of offline RL approaches still faces practical challenges. Specifically: (1) The driving task requires conducting long-horizon planning to avoid shortsighted erroneous decisions (Zhang et al., 2022); (2) The stochasticity of environmental objects during driving also demands real-time responses to their actions (Diehl et al., 2021; Villaflor et al., 2022). The recent success of the Transformer architecture (Vaswani et al., 2017; Brown et al., 2020; Dosovitskiy et al., 2020) has inspired researchers to reformulate offline RL as a sequence modeling problem (Chen et al., 2021), which naturally considers outcomes of multi-step decision-making and has demonstrated efficacy in long-horizon tasks.