Jiang, Chiyu Max
SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout
Jiang, Chiyu Max, Bai, Yijing, Cornman, Andre, Davis, Christopher, Huang, Xiukun, Jeon, Hong, Kulshrestha, Sakshum, Lambert, John, Li, Shuangyu, Zhou, Xuanyu, Fuertes, Carlos, Yuan, Chang, Tan, Mingxing, Zhou, Yin, Anguelov, Dragomir
Realistic and interactive scene simulation is a key prerequisite for autonomous vehicle (AV) development. In this work, we present SceneDiffuser, a scene-level diffusion prior designed for traffic simulation. It offers a unified framework that addresses two key stages of simulation: scene initialization, which involves generating initial traffic layouts, and scene rollout, which encompasses the closed-loop simulation of agent behaviors. While diffusion models have been proven effective in learning realistic and multimodal agent distributions, several challenges remain, including controllability, maintaining realism in closed-loop simulations, and ensuring inference efficiency. To address these issues, we introduce amortized diffusion for simulation. This novel diffusion denoising paradigm amortizes the computational cost of denoising over future simulation steps, significantly reducing the cost per rollout step (16x less inference steps) while also mitigating closed-loop errors. We further enhance controllability through the introduction of generalized hard constraints, a simple yet effective inference-time constraint mechanism, as well as language-based constrained scene generation via few-shot prompting of a large language model (LLM). Our investigations into model scaling reveal that increased computational resources significantly improve overall simulation realism. We demonstrate the effectiveness of our approach on the Waymo Open Sim Agents Challenge, achieving top open-loop performance and the best closed-loop performance among diffusion models.
MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion
Jiang, Chiyu Max, Cornman, Andre, Park, Cheolho, Sapp, Ben, Zhou, Yin, Anguelov, Dragomir
We present MotionDiffuser, a diffusion based representation for the joint distribution of future trajectories over multiple agents. Such representation has several key advantages: first, our model learns a highly multimodal distribution that captures diverse future outcomes. Second, the simple predictor design requires only a single L2 loss training objective, and does not depend on trajectory anchors. Third, our model is capable of learning the joint distribution for the motion of multiple agents in a permutation-invariant manner. Furthermore, we utilize a compressed trajectory representation via PCA, which improves model performance and allows for efficient computation of the exact sample log probability. Subsequently, we propose a general constrained sampling framework that enables controlled trajectory sampling based on differentiable cost functions. This strategy enables a host of applications such as enforcing rules and physical priors, or creating tailored simulation scenarios. MotionDiffuser can be combined with existing backbone architectures to achieve top motion forecasting results. We obtain state-of-the-art results for multi-agent motion prediction on the Waymo Open Motion Dataset.
MeshfreeFlowNet: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework
Jiang, Chiyu Max, Esmaeilzadeh, Soheil, Azizzadenesheli, Kamyar, Kashinath, Karthik, Mustafa, Mustafa, Tchelepi, Hamdi A., Marcus, Philip, Prabhat, null, Anandkumar, Anima
From a numerical perspective, resolving the wide range of spatiotemporal scales within such physical systems is challenging since extremely small spatial and temporal numerical We propose MeshfreeFlowNet, a novel deep learningbased stencils would be required. In order to alleviate the super-resolution framework to generate continuous computational burden of fully resolving such a wide range (grid-free) spatiotemporal solutions from the low-resolution of spatial and temporal scales, multiscale computational approaches inputs. While being computationally efficient, MeshfreeFlowNet have been developed. For instance, in the subsurface accurately recovers the fine-scale quantities flow problem, the main idea of the multiscale approach of interest. MeshfreeFlowNet allows for: (i) the output is to build a set of operators that map between the unknowns to be sampled at all spatiotemporal resolutions, (ii) a set associated with the computational cells in a fine-grid and the of Partial Differential Equation (PDE) constraints to be imposed, unknowns on a coarser grid. The operators are computed and (iii) training on fixed-size inputs on arbitrarily numerically by solving localized flow problems. The multiscale sized spatiotemporal domains owing to its fully convolutional basis functions have subgrid-scale resolutions, ensuring encoder.