Kulshrestha, Sakshum
SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout
Jiang, Chiyu Max, Bai, Yijing, Cornman, Andre, Davis, Christopher, Huang, Xiukun, Jeon, Hong, Kulshrestha, Sakshum, Lambert, John, Li, Shuangyu, Zhou, Xuanyu, Fuertes, Carlos, Yuan, Chang, Tan, Mingxing, Zhou, Yin, Anguelov, Dragomir
Realistic and interactive scene simulation is a key prerequisite for autonomous vehicle (AV) development. In this work, we present SceneDiffuser, a scene-level diffusion prior designed for traffic simulation. It offers a unified framework that addresses two key stages of simulation: scene initialization, which involves generating initial traffic layouts, and scene rollout, which encompasses the closed-loop simulation of agent behaviors. While diffusion models have been proven effective in learning realistic and multimodal agent distributions, several challenges remain, including controllability, maintaining realism in closed-loop simulations, and ensuring inference efficiency. To address these issues, we introduce amortized diffusion for simulation. This novel diffusion denoising paradigm amortizes the computational cost of denoising over future simulation steps, significantly reducing the cost per rollout step (16x less inference steps) while also mitigating closed-loop errors. We further enhance controllability through the introduction of generalized hard constraints, a simple yet effective inference-time constraint mechanism, as well as language-based constrained scene generation via few-shot prompting of a large language model (LLM). Our investigations into model scaling reveal that increased computational resources significantly improve overall simulation realism. We demonstrate the effectiveness of our approach on the Waymo Open Sim Agents Challenge, achieving top open-loop performance and the best closed-loop performance among diffusion models.
A Scalable Training Strategy for Blind Multi-Distribution Noise Removal
Zhang, Kevin, Kulshrestha, Sakshum, Metzler, Christopher
Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g.,~removing Poisson noise) for performance at another (e.g.,~removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e.,~the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape. This approximation allows us to reduce training times by almost two orders of magnitude. We test our method on simulated joint Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training strategy, a single blind, generalist denoiser network can achieve peak signal-to-noise ratios within a uniform bound of specialized denoiser networks across a large range of operating conditions. We also capture a small dataset of images with varying amounts of joint Poisson-Gaussian-Speckle noise and demonstrate that a universal denoiser trained using our adaptive-sampling strategy outperforms uniformly trained baselines.
Secret-Keeping in Question Answering
Rollings, Nathaniel W., O'Sullivan, Kent, Kulshrestha, Sakshum
Existing question-answering research focuses on unanswerable questions in the context of always providing an answer when a system can\dots but what about cases where a system {\bf should not} answer a question. This can either be to protect sensitive users or sensitive information. Many models expose sensitive information under interrogation by an adversarial user. We seek to determine if it is possible to teach a question-answering system to keep a specific fact secret. We design and implement a proof-of-concept architecture and through our evaluation determine that while possible, there are numerous directions for future research to reduce system paranoia (false positives), information leakage (false negatives) and extend the implementation of the work to more complex problems with preserving secrecy in the presence of information aggregation.