Constrained Posterior Sampling: Time Series Generation with Hard Constraints

Narasimhan, Sai Shankar, Agarwal, Shubhankar, Rout, Litu, Shakkottai, Sanjay, Chinchali, Sandeep P.

arXiv.org Artificial Intelligence 

Generating realistic time series samples is crucial for stress-testing models and protecting user privacy by using synthetic data. In engineering and safety-critical applications, these samples must meet certain hard constraints that are domainspecific or naturally imposed by physics or nature. Consider, for example, generating electricity demand patterns with constraints on peak demand times. This can be used to stress-test the functioning of power grids during adverse weather conditions. Existing approaches for generating constrained time series are either not scalable or degrade sample quality. To address these challenges, we introduce Constrained Posterior Sampling (CPS), a diffusion-based sampling algorithm that aims to project the posterior mean estimate into the constraint set after each denoising update. We provide theoretical justifications highlighting the impact of our projection step on sampling. Empirically, CPS outperforms state-of-the-art methods in sample quality and similarity to real time series by around 10% and 42%, respectively, on real-world stocks, traffic, and air quality datasets. Synthesizing realistic time series samples can aid in "what-if" scenario analysis, stress-testing machine learning (ML) models (Rizzato et al., 2022; Gowal et al., 2021), anonymizing private user data (Yoon et al., 2020), etc. Current approaches for time series generation use state-of-the-art (SOTA) generative models, such as Generative Adversarial Networks (GANs) (Yoon et al., 2019; Donahue et al., 2018) and Diffusion Models (DMs) (Tashiro et al., 2021; Alcaraz & Strodthoff, 2023; Narasimhan et al., 2024), to generate high-fidelity time series samples. GPT-4 (Bubeck et al., 2023) and Stable Diffusion (Podell et al., 2023), has increased the focus on constraining the outputs from these models, Note that we cannot clearly define the notion of a constraint set in these domains. For example, verifying if the image of a hand has 6 fingers is practically hard, as all deep-learned perception models for this task have associated prediction errors. However, our key insight is that we can describe a time series through statistical features computed using well-defined functions.