Goto

Collaborating Authors

 Mishra, Siddhartha


Data-Driven, ML-assisted Approaches to Problem Well-Posedness

arXiv.org Artificial Intelligence

Classically, to solve differential equation problems, it is necessary to specify sufficient initial and/or boundary conditions so as to allow the existence of a unique solution. Well-posedness of differential equation problems thus involves studying the existence and uniqueness of solutions, and their dependence to such pre-specified conditions. However, in part due to mathematical necessity, these conditions are usually specified "to arbitrary precision" only on (appropriate portions of) the boundary of the space-time domain. This does not mirror how data acquisition is performed in realistic situations, where one may observe entire "patches" of solution data at arbitrary space-time locations; alternatively one might have access to more than one solutions stemming from the same differential operator. In our short work, we demonstrate how standard tools from machine and manifold learning can be used to infer, in a data driven manner, certain well-posedness features of differential equation problems, for initial/boundary condition combinations under which rigorous existence/uniqueness theorems are not known. Our study naturally combines a data assimilation perspective with an operator-learning one.


Neuro-Symbolic AI for Analytical Solutions of Differential Equations

arXiv.org Artificial Intelligence

The understanding of physical processes has been a long-standing effort for scientists and engineers. A key step in this endeavor is to translate physical insights (laws) into precise mathematical relationships that capture the underlying phenomena. These relationships are then tested through experiments, which either validate the proposed hypothesis or suggest refinements. Among such mathematical formulations, differential equations (DEs) are especially ubiquitous across disciplines, as they describe how physical quantities evolve over time and space. Finding analytical (also referred to as explicit or closed-form) solutions to these equations, that is, a mathematical expression that satisfies the differential equation along with the given initial and boundary conditions, provides a structured way to compare theoretical predictions with experimental measurements. Moreover, analytical solutions often reveal intrinsic properties of physical systems, such as stability, periodicity, underlying symmetries and asymptotic behavior. Thus, analytical solutions provide deep insight into how these systems behave in time and space. Despite intense efforts over centuries, there are very few methods to construct analytical solutions of differential equations. All of them can be viewed as fundamentally compositional: They break complex equations into simpler, more manageable pieces and then systematically recombine those pieces into a final solution.


RIGNO: A Graph-based framework for robust and accurate operator learning for PDEs on arbitrary domains

arXiv.org Artificial Intelligence

Learning the solution operators of PDEs on arbitrary domains is challenging due to the diversity of possible domain shapes, in addition to the often intricate underlying physics. We propose an end-to-end graph neural network (GNN) based neural operator to learn PDE solution operators from data on point clouds in arbitrary domains. Our multi-scale model maps data between input/output point clouds by passing it through a downsampled regional mesh. Many novel elements are also incorporated to ensure resolution invariance and temporal continuity. Our model, termed RIGNO, is tested on a challenging suite of benchmarks, composed of various time-dependent and steady PDEs defined on a diverse set of domains. We demonstrate that RIGNO is significantly more accurate than neural operator baselines and robustly generalizes to unseen spatial resolutions and time instances.


NOMTO: Neural Operator-based symbolic Model approximaTion and discOvery

arXiv.org Artificial Intelligence

While many physical and engineering processes are most effectively described by non-linear symbolic models, existing non-linear symbolic regression (SR) methods are restricted to a limited set of continuous algebraic functions, thereby limiting their applicability to discover higher order non-linear differential relations. In this work, we introduce the Neural Operator-based symbolic Model approximaTion and discOvery (NOMTO) method, a novel approach to symbolic model discovery that leverages Neural Operators to encompass a broad range of symbolic operations. We demonstrate that NOMTO can successfully identify symbolic expressions containing elementary functions with singularities, special functions, and derivatives. Additionally, our experiments demonstrate that NOMTO can accurately rediscover second-order non-linear partial differential equations. It provides a powerful and flexible tool for model discovery, capable of capturing complex relations in a variety of physical systems. Many physical and engineering processes are most effectively described by concise mathematical expressions derived through meticulous observation and analysis. The accuracy of these models is highly dependent on the quality and quantity of available data. With the emergence of large-scale datasets across diverse physical and engineering domains, deriving compact mathematical models in the form of symbolic expressions has become increasingly attainable. This methodology, known as symbolic regression (SR), aims to identify mathematical expressions that most accurately represent given datasets. SR has become indispensable in fields such as physics, biology, and engineering, where it advances knowledge and fosters innovation by uncovering underlying principles and facilitating the development of interpretable predictive models. In recent years, deep learning-based approaches have significantly advanced the field of SR by leveraging neural networks to identify mathematical expressions directly from data.


S7: Selective and Simplified State Space Layers for Sequence Modeling

arXiv.org Artificial Intelligence

A central challenge in sequence modeling is efficiently handling tasks with extended contexts. While recent state-space models (SSMs) have made significant progress in this area, they often lack input-dependent filtering or require substantial increases in model complexity to handle input variability. We address this gap by introducing S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance. We prove that this reparameterization ensures stability in long-sequence modeling by keeping state transitions well-behaved over time. Additionally, it controls the gradient norm, enabling efficient training and preventing issues like exploding or vanishing gradients. S7 significantly outperforms baselines across various sequence modeling tasks, including neuromorphic eventbased datasets, Long Range Arena benchmarks, and various physical and biological time series. Overall, S7 offers a more straightforward approach to sequence modeling without relying on complex, domain-specific inductive biases, achieving significant improvements across key benchmarks. Sequence modeling is a fundamental challenge in deep learning, with applications spanning natural language processing, computer vision, audio processing, and genomics (Sutskever et al., 2014; Graves et al., 2013). The core problem lies in effectively capturing and utilizing information from long input sequences while maintaining computational efficiency. While efficient, convolutional models (Bai et al., 2018) cannot often capture global context. The key challenge is to design a model that can (1) efficiently process very long sequences, (2) adaptively filter and retain relevant information over extended time horizons, (3) perform contentbased reasoning, and (4) maintain a compact state representation. Recent advances in Deep State Space Models (Deep SSMs) (Gu et al., 2020; Hasani et al., 2020) have shown promise, but existing approaches like S4 (Gu et al., 2022a) and Mamba (Gu & Dao, 2023) still face limitations in balancing these requirements.


Generative AI for fast and accurate Statistical Computation of Fluids

arXiv.org Artificial Intelligence

We present a generative AI algorithm for addressing the challenging task of fast, accurate and robust statistical computation of three-dimensional turbulent fluid flows. Our algorithm, termed as GenCFD, is based on a conditional score-based diffusion model. Through extensive numerical experimentation with both incompressible and compressible fluid flows, we demonstrate that GenCFD provides very accurate approximation of statistical quantities of interest such as mean, variance, point pdfs, higher-order moments, while also generating high quality realistic samples of turbulent fluid flows and ensuring excellent spectral resolution. In contrast, ensembles of operator learning baselines which are trained to minimize mean (absolute) square errors regress to the mean flow. We present rigorous theoretical results uncovering the surprising mechanisms through which diffusion models accurately generate fluid flows. These mechanisms are illustrated with solvable toy models that exhibit the relevant features of turbulent fluid flows while being amenable to explicit analytical formulas.


Poseidon: Efficient Foundation Models for PDEs

arXiv.org Artificial Intelligence

We introduce Poseidon, a foundation model for learning the solution operators of PDEs. It is based on a multiscale operator transformer, with time-conditioned layer norms that enable continuous-in-time evaluations. A novel training strategy leveraging the semi-group property of time-dependent PDEs to allow for significant scaling-up of the training data is also proposed. Poseidon is pretrained on a diverse, large scale dataset for the governing equations of fluid dynamics. It is then evaluated on a suite of 15 challenging downstream tasks that include a wide variety of PDE types and operators. We show that Poseidon exhibits excellent performance across the board by outperforming baselines significantly, both in terms of sample efficiency and accuracy. Poseidon also generalizes very well to new physics that is not seen during pretraining. Moreover, Poseidon scales with respect to model and data size, both for pretraining and for downstream tasks. Taken together, our results showcase the surprising ability of Poseidon to learn effective representations from a very small set of PDEs during pretraining in order to generalize well to unseen and unrelated PDEs downstream, demonstrating its potential as an effective, general purpose PDE foundation model.


FUSE: Fast Unified Simulation and Estimation for PDEs

arXiv.org Artificial Intelligence

Partial Differential Equations (PDEs) describe the propagation of system conditions for a very wide range of physical systems. Parametric PDEs not only consider different system conditions, but the underlying solution operator is also characterized by a set of discrete parameters. Traditional numerical methods based on different discretization schemes such as Finite Differences, Finite Volumes and Finite Elements have been developed along with fast and parallelizable implementations to tackle complex problems, such as atmospheric modeling and cardiovascular biomechanics. From parametric PDEs, these methods define maps from the underlying set of discrete parameters, which describe the dynamics and the boundary/initial conditions, to physical quantities such as velocity or pressure that are continuous in the spatio-temporal domain. Despite their successful application, there still exist well-known drawbacks of traditional solvers. To describe a particular physical phenomenon, PDE parameters and solvers need to be calibrated on precise conditions that are not known a priori and cannot easily be measured in realistic applications. Therefore, iterative and thus expensive calibration procedures are considered in the cases where the parameters and conditions are inferred from data [1]. Even after the solvers are calibrated, an ensemble of solutions need to be generated to account for uncertainties in the model parameters or assess the sensitivity of the solution to different parameters which are computationally prohibitive downstream tasks [2]. For these reasons, a large variety of deep learning algorithms have recently been proposed for scientific applications, broadly categorized into surrogate and inverse modeling algorithms, to either reduce the computational time of complex simulations or infer missing discrete information from data to calibrate a simulator to precise conditions.


SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

arXiv.org Artificial Intelligence

Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT) for specialized domains poses significant challenges due to the complexity of fine-tuning data. To bridge this gap, we introduce an effective and scalable data selection method for SFT, SmallToLarge (S2L), which leverages training trajectories from small models to guide the data selection for larger models. We demonstrate through extensive experiments that S2L significantly improves data efficiency in SFT for mathematical problem-solving, reducing the training data to just 11% of the original MathInstruct dataset (Yue et al., 2023) to match full dataset performance while outperforming state-of-the-art data selection algorithms by an average of 4.7% across 6 in- and out-domain evaluation datasets. Remarkably, selecting only 50K data for SFT, S2L achieves a 32.7% accuracy on the most challenging MATH (Hendrycks et al., 2021) benchmark, improving Phi-2 (Li et al., 2023b) by 16.6%. In clinical text summarization on the MIMIC-III dataset (Johnson et al., 2016), S2L again outperforms training on the full dataset using only 50% of the data. Notably, S2L can perform data selection using a reference model 40x smaller than the target model, proportionally reducing the cost of data selection.


Numerical analysis of physics-informed neural networks and related models in physics-informed machine learning

arXiv.org Artificial Intelligence

Physics-informed neural networks (PINNs) and their variants have been very popular in recent years as algorithms for the numerical simulation of both forward and inverse problems for partial differential equations. This article aims to provide a comprehensive review of currently available results on the numerical analysis of PINNs and related models that constitute the backbone of physics-informed machine learning. We provide a unified framework in which analysis of the various components of the error incurred by PINNs in approximating PDEs can be effectively carried out. A detailed review of available results on approximation, generalization and training errors and their behavior with respect to the type of the PDE and the dimension of the underlying domain is presented. In particular, the role of the regularity of the solutions and their stability to perturbations in the error analysis is elucidated. Numerical results are also presented to illustrate the theory. We identify training errors as a key bottleneck which can adversely affect the overall performance of various models in physics-informed machine learning.