Goto

Collaborating Authors

 interpolation model




ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting

arXiv.org Artificial Intelligence

We introduce ODE-GS, a novel approach that integrates 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to enable future extrapolation of dynamic 3D scenes. Unlike existing dynamic scene reconstruction methods, which rely on time-conditioned deformation networks and are limited to interpolation within a fixed time window, ODE-GS eliminates timestamp dependency by modeling Gaussian parameter trajectories as continuous-time latent dynamics. Our approach first learns an interpolation model to generate accurate Gaussian trajectories within the observed window, then trains a Transformer encoder to aggregate past trajectories into a latent state evolved via a neural ODE. Finally, numerical integration produces smooth, physically plausible future Gaussian trajectories, enabling rendering at arbitrary future timestamps. On the D-NeRF, NVFi, and HyperNeRF benchmarks, ODE-GS achieves state-of-the-art extrapolation performance, improving metrics by 19.8% compared to leading baselines, demonstrating its ability to accurately represent and predict 3D scene dynamics.


Mapping Galaxy Images Across Ultraviolet, Visible and Infrared Bands Using Generative Deep Learning

arXiv.org Artificial Intelligence

ABSTRACT We demonstrate that generative deep learning can translate galaxy observations across ultraviolet, visible, and infrared photometric bands. Leveraging mock observations from the Illustris simulations, we develop and validate a supervised image-to-image model capable of performing both band interpolation and extrapolation. The resulting trained models exhibit high fidelity in generating outputs, as verified by both general image comparison metrics (MAE, SSIM, PSNR) and specialized astronomical metrics (GINI coefficient, M20). Moreover, we show that our model can be used to predict real-world observations, using data from the DECaLS survey as a case study. These findings highlight the potential of generative learning to augment astronomical datasets, enabling efficient exploration of multi-band information in regions where observations are incomplete. This work opens new pathways for optimizing mission planning, guiding high-resolution follow-ups, and enhancing our understanding of galaxy morphology and evolution. INTRODUCTION Galaxies can appear quite different when observed across various photometric bands due to the wavelength-dependent nature of the light they emit (Kouroumpatzakis et al. 2023). In shorter wavelength bands, like the ultraviolet (U) band, galaxies reveal the presence of hot, young stars and regions of active star formation. Meanwhile, in optical bands (e.g., G, R), the light primarily comes from older, cooler stars. At longer wavelengths, such as in infrared bands, the emission is dominated by dust and the cooler, more evolved stellar populations. Studying galaxies across multiple photometric bands is essential because each band reveals different aspects of a galaxy's physical properties, offering a more complete picture of its structure, composition, and evolution. By combining observations from various bands, it is possible to trace a galaxy's star formation history, stellar populations, dust content, and even interactions with its environment. Observations made in multiple wavelengths require fewer inferences and lead to stronger conclusions being drawn about the observed galaxies.


Modulated Adaptive Fourier Neural Operators for Temporal Interpolation of Weather Forecasts

arXiv.org Artificial Intelligence

Weather and climate data are often available at limited temporal resolution, either due to storage limitations, or in the case of weather forecast models based on deep learning, their inherently long time steps. The coarse temporal resolution makes it difficult to capture rapidly evolving weather events. To address this limitation, we introduce an interpolation model that reconstructs the atmospheric state between two points in time for which the state is known. The model makes use of a novel network layer that modifies the adaptive Fourier neural operator (AFNO), which has been previously used in weather prediction and other applications of machine learning to physics problems. The modulated AFNO (ModAFNO) layer takes an embedding, here computed from the interpolation target time, as an additional input and applies a learned shift-scale operation inside the AFNO layers to adapt them to the target time. Thus, one model can be used to produce all intermediate time steps. Trained to interpolate between two time steps 6 h apart, the ModAFNO-based interpolation model produces 1 h resolution intermediate time steps that are visually nearly indistinguishable from the actual corresponding 1 h resolution data. The model reduces the RMSE loss of reconstructing the intermediate steps by approximately 50% compared to linear interpolation. We also demonstrate its ability to reproduce the statistics of extreme weather events such as hurricanes and heat waves better than 6 h resolution data. The ModAFNO layer is generic and is expected to be applicable to other problems, including weather forecasting with tunable lead time.


L4GM: Large 4D Gaussian Reconstruction Model

arXiv.org Artificial Intelligence

We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input - in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM [49], a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets.


Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

arXiv.org Artificial Intelligence

Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos. Similarly, we temporally align diffusion model upsamplers, turning them into temporally consistent video super resolution models. We focus on two relevant real-world applications: Simulation of in-the-wild driving data and creative content creation with text-to-video modeling. In particular, we validate our Video LDM on real driving videos of resolution 512 x 1024, achieving state-of-the-art performance. Furthermore, our approach can easily leverage off-the-shelf pre-trained image LDMs, as we only need to train a temporal alignment model in that case. Doing so, we turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048. We show that the temporal layers trained in this way generalize to different fine-tuned text-to-image LDMs. Utilizing this property, we show the first results for personalized text-to-video generation, opening exciting directions for future content creation. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/


On Forecast Stability

arXiv.org Artificial Intelligence

Forecasts are typically not produced in a vacuum but in a business context, where forecasts are generated on a regular basis and interact with each other. For decisions, it may be important that forecasts do not change arbitrarily, and are stable in some sense. However, this area has received only limited attention in the forecasting literature. In this paper, we explore two types of forecast stability that we call vertical stability and horizontal stability. The existing works in the literature are only applicable to certain base models and extending these frameworks to be compatible with any base model is not straightforward. Furthermore, these frameworks can only stabilise the forecasts vertically. To fill this gap, we propose a simple linear-interpolation-based approach that is applicable to stabilise the forecasts provided by any base model vertically and horizontally. The approach can produce both accurate and stable forecasts. Using N-BEATS, Pooled Regression and LightGBM as the base models, in our evaluation on four publicly available datasets, the proposed framework is able to achieve significantly higher stability and/or accuracy compared to a set of benchmarks including a state-of-the-art forecast stabilisation method across three error metrics and six stability metrics.


A Data-driven Reduced Order Modeling Approach Applied In Context Of Numerical Analysis And Optimization Of Plastic Profile Extrusion

arXiv.org Artificial Intelligence

In course of this work, we examine the process of plastic profile extrusion, where a polymer melt is shaped inside the so-called extrusion die and fixed in its shape by solidification in the downstream calibration unit. More precise, we focus on the development of a data-driven reduced order model (ROM) for the purpose of predicting temperature distributions within the extruded profiles inside the calibration unit. Therein, the ROM functions as a first step to our overall goal of prediction based process control in order to avoid undesired warpage and damages of the final product.


Wind Field Reconstruction with Adaptive Random Fourier Features

arXiv.org Machine Learning

We investigate the use of spatial interpolation methods for reconstructing the horizontal near-surface wind field given a sparse set of measurements. In particular, random Fourier features is compared to a set of benchmark methods including Kriging and Inverse distance weighting. Random Fourier features is a linear model $\beta(\pmb x) = \sum_{k=1}^K \beta_k e^{i\omega_k \pmb x}$ approximating the velocity field, with frequencies $\omega_k$ randomly sampled and amplitudes $\beta_k$ trained to minimize a loss function. We include a physically motivated divergence penalty term $|\nabla \cdot \beta(\pmb x)|^2$, as well as a penalty on the Sobolev norm. We derive a bound on the generalization error and derive a sampling density that minimizes the bound. Following (arXiv:2007.10683 [math.NA]), we devise an adaptive Metropolis-Hastings algorithm for sampling the frequencies of the optimal distribution. In our experiments, our random Fourier features model outperforms the benchmark models.