Goto

Collaborating Authors

 integration scheme


Integration Matters for Learning PDEs with Backward SDEs

Neural Information Processing Systems

Backward stochastic differential equation (BSDE)-based deep learning methods provide an alternative to Physics-Informed Neural Networks (PINNs) for solving high-dimensional partial differential equations (PDEs), offering potential algorithmic advantages in settings such as stochastic optimal control, where the PDEs of interest are tied to an underlying dynamical system. However, standard BSDE-based solvers have empirically been shown to underperform relative to PINNs in the literature. In this paper, we identify the root cause of this performance gap as a discretization bias introduced by the standard Euler-Maruyama (EM) integration scheme applied to one-step self-consistency BSDE losses, which shifts the optimization landscape off target. We find that this bias cannot be satisfactorily addressed through finer step-sizes or multi-step self-consistency losses. To properly handle this issue, we propose a Stratonovich-based BSDE formulation, which we implement with stochastic Heun integration. We show that our proposed approach completely eliminates the bias issues faced by EM integration. Furthermore, our empirical results show that our Heun-based BSDE method consistently outperforms EM-based variants and achieves competitive results with PINNs across multiple high-dimensional benchmarks. Our findings highlight the critical role of integration schemes in BSDE-based PDE solvers, an algorithmic detail that has received little attention thus far in the literature.


The Vanishing Gradient Problem for Stiff Neural Differential Equations

arXiv.org Artificial Intelligence

Gradient-based optimization of neural differential equations and other parameterized dynamical systems fundamentally relies on the ability to differentiate numerical solutions with respect to model parameters. In stiff systems, it has been observed that sensitivities to parameters controlling fast-decaying modes become vanishingly small during training, leading to optimization difficulties. In this paper, we show that this vanishing gradient phenomenon is not an artifact of any particular method, but a universal feature of all A-stable and L-stable stiff numerical integration schemes. We analyze the rational stability function for general stiff integration schemes and demonstrate that the relevant parameter sensitivities, governed by the derivative of the stability function, decay to zero for large stiffness. Explicit formulas for common stiff integration schemes are provided, which illustrate the mechanism in detail. Finally, we rigorously prove that the slowest possible rate of decay for the derivative of the stability function is $O(|z|^{-1})$, revealing a fundamental limitation: all A-stable time-stepping methods inevitably suppress parameter gradients in stiff regimes, posing a significant barrier for training and parameter identification in stiff neural ODEs.


Thermodynamics-informed super-resolution of scarce temporal dynamics data

arXiv.org Artificial Intelligence

We present a method to increase the resolution of measurements of a physical system and subsequently predict its time evolution using thermodynamics-aware neural networks. Our method uses adversarial autoencoders, which reduce the dimensionality of the full order model to a set of latent variables that are enforced to match a prior, for example a normal distribution. Adversarial autoencoders are seen as generative models, and they can be trained to generate high-resolution samples from low-resoution inputs, meaning they can address the so-called super-resolution problem. Then, a second neural network is trained to learn the physical structure of the latent variables and predict their temporal evolution. This neural network is known as an structure-preserving neural network. It learns the metriplectic-structure of the system and applies a physical bias to ensure that the first and second principles of thermodynamics are fulfilled. The integrated trajectories are decoded to their original dimensionality, as well as to the higher dimensionality space produced by the adversarial autoencoder and they are compared to the ground truth solution. The method is tested with two examples of flow over a cylinder, where the fluid properties are varied between both examples.


Is there an optimal choice of configuration space for Lie group integration schemes applied to constrained MBS?

arXiv.org Artificial Intelligence

Recently various numerical integration schemes have been proposed for numerically simulating the dynamics of constrained multibody systems (MBS) operating. These integration schemes operate directly on the MBS configuration space considered as a Lie group. For discrete spatial mechanical systems there are two Lie group that can be used as configuration space: $SE\left( 3\right) $ and $SO\left( 3\right) \times \mathbb{R}^{3}$. Since the performance of the numerical integration scheme clearly depends on the underlying configuration space it is important to analyze the effect of using either variant. For constrained MBS a crucial aspect is the constraint satisfaction. In this paper the constraint violation observed for the two variants are investigated. It is concluded that the $SE\left( 3\right) $ formulation outperforms the $SO\left( 3\right) \times \mathbb{R}^{3}$ formulation if the absolute motions of the rigid bodies, as part of a constrained MBS, belong to a motion subgroup. In all other cases both formulations are equivalent. In the latter cases the $SO\left( 3\right) \times \mathbb{R}^{3}$ formulation should be used since the $SE\left( 3\right) $ formulation is numerically more complex, however.


The significance of the configuration space Lie group for the constraint satisfaction in numerical time integration of multibody systems

arXiv.org Artificial Intelligence

The dynamics simulation of multibody systems (MBS) using spatial velocities (non-holonomic velocities) requires time integration of the dynamics equations together with the kinematic reconstruction equations (relating time derivatives of configuration variables to rigid body velocities). The latter are specific to the geometry of the rigid body motion underlying a particular formulation, and thus to the used configuration space (c-space). The proper c-space of a rigid body is the Lie group SE(3), and the geometry is that of the screw motions. The rigid bodies within a MBS are further subjected to geometric constraints, often due to lower kinematic pairs that define SE(3) subgroups. Traditionally, however, in MBS dynamics the translations and rotations are parameterized independently, which implies the use of the direct product group $SO\left( 3\right) \times {\Bbb R}^{3}$ as rigid body c-space, although this does not account for rigid body motions. Hence, its appropriateness was recently put into perspective. In this paper the significance of the c-space for the constraint satisfaction in numerical time stepping schemes is analyzed for holonomicaly constrained MBS modeled with the 'absolute coordinate' approach, i.e. using the Newton-Euler equations for the individual bodies subjected to geometric constraints. It is shown that the geometric constraints a body is subjected to are exactly satisfied if they constrain the motion to a subgroup of its c-space. Since only the $SE\left( 3\right) $ subgroups have a practical significance it is regarded as the appropriate c-space for the constrained rigid body. Consequently the constraints imposed by lower pair joints are exactly satisfied if the joint connects a body to the ground. For a general MBS, where the motions are not constrained to a subgroup, the SE(3) and $SO\left( 3\right) \times {\Bbb R}^{3}$ yield the same order of accuracy.


TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision

arXiv.org Artificial Intelligence

Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the $\textit{Time-Evolving Natural Gradient (TENG)}$, generalizing time-dependent variational principles and optimization-based time integration, leveraging natural gradient optimization to obtain high accuracy in neural-network-based PDE solutions. Our comprehensive development includes algorithms like TENG-Euler and its high-order variants, such as TENG-Heun, tailored for enhanced precision and efficiency. TENG's effectiveness is further validated through its performance, surpassing current leading methods and achieving $\textit{machine precision}$ in step-by-step optimizations across a spectrum of PDEs, including the heat equation, Allen-Cahn equation, and Burgers' equation.


Learning continuous models for continuous physics

arXiv.org Artificial Intelligence

Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach is that ML models are typically trained on discrete data, using ML methodologies that are not aware of underlying continuity properties. This results in models that often do not capture any underlying continuous dynamics -- either of the system of interest, or indeed of any related system. To address this challenge, we develop a convergence test based on numerical analysis theory. Our test verifies whether a model has learned a function that accurately approximates an underlying continuous dynamics. Models that fail this test fail to capture relevant dynamics, rendering them of limited utility for many scientific prediction tasks; while models that pass this test enable both better interpolation and better extrapolation in multiple ways. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.


Time integration schemes based on neural networks for solving partial differential equations on coarse grids

arXiv.org Artificial Intelligence

The accuracy of solving partial differential equations (PDEs) on coarse grids is greatly affected by the choice of discretization schemes. In this work, we propose to learn time integration schemes based on neural networks which satisfy three distinct sets of mathematical constraints, i.e., unconstrained, semi-constrained with the root condition, and fully-constrained with both root and consistency conditions. We focus on the learning of 3-step linear multistep methods, which we subsequently applied to solve three model PDEs, i.e., the one-dimensional heat equation, the one-dimensional wave equation, and the one-dimensional Burgers' equation. The results show that the prediction error of the learned fully-constrained scheme is close to that of the Runge-Kutta method and Adams-Bashforth method. Compared to the traditional methods, the learned unconstrained and semi-constrained schemes significantly reduce the prediction error on coarse grids. On a grid that is 4 times coarser than the reference grid, the mean square error shows a reduction of up to an order of magnitude for some of the heat equation cases, and a substantial improvement in phase prediction for the wave equation. On a 32 times coarser grid, the mean square error for the Burgers' equation can be reduced by up to 35% to 40%.


Time-vectorized numerical integration for systems of ODEs

arXiv.org Machine Learning

Stiff systems of ordinary differential equations (ODEs) and sparse training data are common in scientific problems. This paper describes efficient, implicit, vectorized methods for integrating stiff systems of ordinary differential equations through time and calculating parameter gradients with the adjoint method. The main innovation is to vectorize the problem both over the number of independent times series and over a batch or "chunk" of sequential time steps, effectively vectorizing the assembly of the implicit system of ODEs. The block-bidiagonal structure of the linearized implicit system for the backward Euler method allows for further vectorization using parallel cyclic reduction (PCR). Vectorizing over both axes of the input data provides a higher bandwidth of calculations to the computing device, allowing even problems with comparatively sparse data to fully utilize modern GPUs and achieving speed ups of greater than 100x, compared to standard, sequential time integration. We demonstrate the advantages of implicit, vectorized time integration with several example problems, drawn from both analytical stiff and non-stiff ODE models as well as neural ODE models. We also describe and provide a freely available open-source implementation of the methods developed here.


OceanNet: A principled neural operator-based digital twin for regional oceans

arXiv.org Artificial Intelligence

While data-driven approaches demonstrate great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for ocean circulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector integration scheme to mitigate autoregressive error growth and enhance stability over extended time scales. A spectral regularizer counteracts spectral bias at smaller scales. OceanNet is applied to the northwest Atlantic Ocean western boundary current (the Gulf Stream), focusing on the task of seasonal prediction for Loop Current eddies and the Gulf Stream meander. Trained using historical sea surface height (SSH) data, OceanNet demonstrates competitive forecast skill by outperforming SSH predictions by an uncoupled, state-of-the-art dynamical ocean model forecast, reducing computation by 500,000 times. These accomplishments demonstrate the potential of physics-inspired deep neural operators as cost-effective alternatives to high-resolution numerical ocean models.