Mardani, Morteza
Heavy-Tailed Diffusion Models
Pandey, Kushagra, Pathak, Jaideep, Xu, Yilun, Mandt, Stephan, Pritchard, Michael, Vahdat, Arash, Mardani, Morteza
Diffusion models achieve state-of-the-art generation quality across many applications, but their ability to capture rare or extreme events in heavy-tailed distributions remains unclear. In this work, we show that traditional diffusion and flow-matching models with standard Gaussian priors fail to capture heavy-tailed behavior. We address this by repurposing the diffusion framework for heavy-tail estimation using multivariate Student-t distributions. We develop a tailored perturbation kernel and derive the denoising posterior based on the conditional Student-t distribution for the backward process. Inspired by $\gamma$-divergence for heavy-tailed distributions, we derive a training objective for heavy-tailed denoisers. The resulting framework introduces controllable tail generation using only a single scalar hyperparameter, making it easily tunable for diverse real-world distributions. As specific instantiations of our framework, we introduce t-EDM and t-Flow, extensions of existing diffusion and flow models that employ a Student-t prior. Remarkably, our approach is readily compatible with standard Gaussian diffusion models and requires only minimal code changes. Empirically, we show that our t-EDM and t-Flow outperform standard diffusion models in heavy-tail estimation on high-resolution weather datasets in which generating rare and extreme events is crucial.
Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models
Daras, Giannis, Nie, Weili, Kreis, Karsten, Dimakis, Alex, Mardani, Morteza, Kovachki, Nikola Borislavov, Vahdat, Arash
Using image models naively for solving inverse video problems often suffers from flickering, texture-sticking, and temporal inconsistency in generated videos. To tackle these problems, in this paper, we view frames as continuous functions in the 2D space, and videos as a sequence of continuous warping transformations between different frames. This perspective allows us to train function space diffusion models only on images and utilize them to solve temporally correlated inverse problems. The function space diffusion models need to be equivariant with respect to the underlying spatial transformations. To ensure temporal consistency, we introduce a simple post-hoc test-time guidance towards (self)-equivariant solutions. Our method allows us to deploy state-of-the-art latent diffusion models such as Stable Diffusion XL to solve video inverse problems. We demonstrate the effectiveness of our method for video inpainting and $8\times$ video super-resolution, outperforming existing techniques based on noise transformations. We provide generated video results: https://giannisdaras.github.io/warped_diffusion.github.io/.
Stochastic Flow Matching for Resolving Small-Scale Physics
Fotiadis, Stathi, Brenowitz, Noah, Geffner, Tomas, Cohen, Yair, Pritchard, Michael, Vahdat, Arash, Mardani, Morteza
Conditioning diffusion and flow models have proven effective for super-resolving small-scale details in natural images. However, in physical sciences such as weather, super-resolving small-scale details poses significant challenges due to: (i) misalignment between input and output distributions (i.e., solutions to distinct partial differential equations (PDEs) follow different trajectories), (ii) multi-scale dynamics, deterministic dynamics at large scales vs. stochastic at small scales, and (iii) limited data, increasing the risk of overfitting. To address these challenges, we propose encoding the inputs to a latent base distribution that is closer to the target distribution, followed by flow matching to generate small-scale physics. The encoder captures the deterministic components, while flow matching adds stochastic small-scale details. To account for uncertainty in the deterministic part, we inject noise into the encoder's output using an adaptive noise scaling mechanism, which is dynamically adjusted based on maximum-likelihood estimates of the encoder's predictions. We conduct extensive experiments on both the realworld CWA weather dataset and the PDE-based Kolmogorov dataset, with the CWA task involving super-resolving the weather variables for the region of Taiwan from 25 km to 2 km scales. Our results show that the proposed stochastic flow matching (SFM) framework significantly outperforms existing methods such as conditional diffusion and flows. Resolving small-scale physics is crucial in many scientific applications (Wilby et al., 1998; Rampal et al., 2022; 2024). For instance, in the atmospheric sciences, accurately capturing small-scale dynamics is essential for local planning and disaster mitigation. The success of conditional diffusion models in super-resolving natural images and videos (Song et al., 2021; Batzolis et al., 2021; Hoogeboom et al., 2023) has recently been extended to super-resolving small-scale physics (Aich et al., 2024; Ling et al., 2024). However, this task faces significant challenges: (C1) Input and target data are often spatially misaligned due to differing PDE solutions operating at various resolutions, leading to divergent trajectories. Additionally, the input and target variables (channels) often represent different physical quantities, causing further misalignment. Few efforts have been made to directly address these challenges in generative learning. Prior work typically relies on residual learning approaches (Mardani et al., 2023; Zhao et al., 2021).
Repulsive Score Distillation for Diverse Sampling of Diffusion Models
Zilberstein, Nicolas, Mardani, Morteza, Segarra, Santiago
Score distillation sampling has been pivotal for integrating diffusion models into generation of complex visuals. Despite impressive results it suffers from mode collapse and lack of diversity. To cope with this challenge, we leverage the gradient flow interpretation of score distillation to propose Repulsive Score Distillation (RSD). In particular, we propose a variational framework based on repulsion of an ensemble of particles that promotes diversity. Using a variational approximation that incorporates a coupling among particles, the repulsion appears as a simple regularization that allows interaction of particles based on their relative pairwise similarity, measured e.g., via radial basis kernels. We design RSD for both unconstrained and constrained sampling scenarios. For constrained sampling we focus on inverse problems in the latent space that leads to an augmented variational formulation, that strikes a good balance between compute, quality and diversity. Our extensive experiments for text-to-image generation, and inverse problems demonstrate that RSD achieves a superior trade-off between diversity and quality compared with state-of-the-art alternatives.
Generative Data Assimilation of Sparse Weather Station Observations at Kilometer Scales
Manshausen, Peter, Cohen, Yair, Pathak, Jaideep, Pritchard, Mike, Garg, Piyush, Mardani, Morteza, Kashinath, Karthik, Byrne, Simon, Brenowitz, Noah
Data assimilation of observational data into full atmospheric states is essential for weather forecast model initialization. Recently, methods for deep generative data assimilation have been proposed which allow for using new input data without retraining the model. They could also dramatically accelerate the costly data assimilation process used in operational regional weather models. Here, in a central US testbed, we demonstrate the viability of score-based data assimilation in the context of realistically complex km-scale weather. We train an unconditional diffusion model to generate snapshots of a state-of-the-art km-scale analysis product, the High Resolution Rapid Refresh. Then, using score-based data assimilation to incorporate sparse weather station data, the model produces maps of precipitation and surface winds. The generated fields display physically plausible structures, such as gust fronts, and sensitivity tests confirm learnt physics through multivariate relationships. Preliminary skill analysis shows the approach already outperforms a naive baseline of the High-Resolution Rapid Refresh system itself. By incorporating observations from 40 weather stations, 10\% lower RMSEs on left-out stations are attained. Despite some lingering imperfections such as insufficiently disperse ensemble DA estimates, we find the results overall an encouraging proof of concept, and the first at km-scale. It is a ripe time to explore extensions that combine increasingly ambitious regional state generators with an increasing set of in situ, ground-based, and satellite remote sensing data streams.
Compositional Text-to-Image Generation with Dense Blob Representations
Nie, Weili, Liu, Sifei, Mardani, Morteza, Liu, Chao, Eckart, Benjamin, Vahdat, Arash
Existing text-to-image models struggle to follow complex text prompts, raising the need for extra grounding inputs for better controllability. In this work, we propose to decompose a scene into visual primitives - denoted as dense blob representations - that contain fine-grained details of the scene while being modular, human-interpretable, and easy-to-construct. Based on blob representations, we develop a blob-grounded text-to-image diffusion model, termed BlobGEN, for compositional generation. Particularly, we introduce a new masked cross-attention module to disentangle the fusion between blob representations and visual features. To leverage the compositionality of large language models (LLMs), we introduce a new in-context learning approach to generate blob representations from text prompts. Our extensive experiments show that BlobGEN achieves superior zero-shot generation quality and better layout-guided controllability on MS-COCO. When augmented by LLMs, our method exhibits superior numerical and spatial correctness on compositional image generation benchmarks.
DiffObs: Generative Diffusion for Global Forecasting of Satellite Observations
Stock, Jason, Pathak, Jaideep, Cohen, Yair, Pritchard, Mike, Garg, Piyush, Durran, Dale, Mardani, Morteza, Brenowitz, Noah
This work presents an autoregressive generative diffusion model (DiffObs) to predict the global evolution of daily precipitation, trained on a satellite observational product, and assessed with domain-specific diagnostics. The model is trained to probabilistically forecast day-ahead precipitation. Nonetheless, it is stable for multi-month rollouts, which reveal a qualitatively realistic superposition of convectively coupled wave modes in the tropics. Cross-spectral analysis confirms successful generation of low frequency variations associated with the Madden--Julian oscillation, which regulates most subseasonal to seasonal predictability in the observed atmosphere, and convectively coupled moist Kelvin waves with approximately correct dispersion relationships. Despite secondary issues and biases, the results affirm the potential for a next generation of global diffusion models trained on increasingly sparse, and increasingly direct and differentiated observations of the world, for practical applications in subseasonal and climate prediction.
Residual Diffusion Modeling for Km-scale Atmospheric Downscaling
Mardani, Morteza, Brenowitz, Noah, Cohen, Yair, Pathak, Jaideep, Chen, Chieh-Yu, Liu, Cheng-Chin, Vahdat, Arash, Kashinath, Karthik, Kautz, Jan, Pritchard, Mike
Predictions of weather hazard require expensive km-scale simulations driven by coarser global inputs. Here, a cost-effective stochastic downscaling model is trained from a high-resolution 2-km weather model over Taiwan conditioned on 25-km ERA5 reanalysis. To address the multi-scale machine learning challenges of weather data, we employ a two-step approach Corrector Diffusion (\textit{CorrDiff}), where a UNet prediction of the mean is corrected by a diffusion step. Akin to Reynolds decomposition in fluid dynamics, this isolates generative learning to the stochastic scales. \textit{CorrDiff} exhibits skillful RMSE and CRPS and faithfully recovers spectra and distributions even for extremes. Case studies of coherent weather phenomena reveal appropriate multivariate relationships reminiscent of learnt physics: the collocation of intense rainfall and sharp gradients in fronts and extreme winds and rainfall bands near the eyewall of typhoons. Downscaling global forecasts successfully retains many of these benefits, foreshadowing the potential of end-to-end, global-to-km-scales machine learning weather predictions.
A Variational Perspective on Solving Inverse Problems with Diffusion Models
Mardani, Morteza, Song, Jiaming, Kautz, Jan, Vahdat, Arash
Diffusion models have emerged as a key pillar of foundation models in visual domains. One of their critical applications is to universally solve different downstream inverse tasks via a single diffusion prior without re-training for each task. Most inverse tasks can be formulated as inferring a posterior distribution over data (e.g., a full image) given a measurement (e.g., a masked image). This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable. To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution. We show that our approach naturally leads to regularization by denoising diffusion process (RED-Diff) where denoisers at different timesteps concurrently impose different structural constraints over the image. To gauge the contribution of denoisers from different timesteps, we propose a weighting mechanism based on signal-to-noise-ratio (SNR). Our approach provides a new variational perspective for solving inverse problems with diffusion models, allowing us to formulate sampling as stochastic optimization, where one can simply apply off-the-shelf solvers with lightweight iterates. Our experiments for image restoration tasks such as inpainting and superresolution demonstrate the strengths of our method compared with state-of-the-art sampling-based diffusion models.
Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions
Sahiner, Arda, Ergen, Tolga, Ozturkler, Batu, Bartan, Burak, Pauly, John, Mardani, Morteza, Pilanci, Mert
Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GDA), but it is unclear whether the optimization problem contains any saddle points, or whether heuristic methods can find them in practice. In this work, we analyze the training of Wasserstein GANs with two-layer neural network discriminators through the lens of convex duality, and for a variety of generators expose the conditions under which Wasserstein GANs can be solved exactly with convex optimization approaches, or can be represented as convex-concave games. Using this convex duality interpretation, we further demonstrate the impact of different activation functions of the discriminator. Our observations are verified with numerical results demonstrating the power of the convex interpretation, with applications in progressive training of convex architectures corresponding to linear generators and quadratic-activation discriminators for CelebA image generation.