Goto

Collaborating Authors

 Schaeffer, Hayden


Cauchy Random Features for Operator Learning in Sobolev Space

arXiv.org Machine Learning

Operator learning is the approximation of operators between infinite dimensional Banach spaces using machine learning approaches. While most progress in this area has been driven by variants of deep neural networks such as the Deep Operator Network and Fourier Neural Operator, the theoretical guarantees are often in the form of a universal approximation property. However, the existence theorems do not guarantee that an accurate operator network is obtainable in practice. Motivated by the recent kernel-based operator learning framework, we propose a random feature operator learning method with theoretical guarantees and error bounds. The random feature method can be viewed as a randomized approximation of a kernel method, which significantly reduces the computation requirements for training. We provide a generalization error analysis for our proposed random feature operator learning method along with comprehensive numerical results. Compared to kernel-based method and neural network methods, the proposed method can obtain similar or better test errors across benchmarks examples with significantly reduced training times. An additional advantages it that our implementation is simple and does require costly computational resources, such as GPU.


A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions

arXiv.org Artificial Intelligence

Neural networks are one tool for approximating non-linear differential equations used in scientific computing tasks such as surrogate modeling, real-time predictions, and optimal control. PDE foundation models utilize neural networks to train approximations to multiple differential equations simultaneously and are thus a general purpose solver that can be adapted to downstream tasks. Current PDE foundation models focus on either learning general solution operators and/or the governing system of equations, and thus only handle numerical or symbolic modalities. However, real-world applications may require more flexible data modalities, e.g. text analysis or descriptive outputs. To address this gap, we propose a novel multimodal deep learning approach that leverages a transformer-based architecture to approximate solution operators for a wide variety of ODEs and PDEs. Our method integrates numerical inputs, such as equation parameters and initial conditions, with text descriptions of physical processes or system dynamics. This enables our model to handle settings where symbolic representations may be incomplete or unavailable. In addition to providing accurate numerical predictions, our approach generates interpretable scientific text descriptions, offering deeper insights into the underlying dynamics and solution properties. The numerical experiments show that our model provides accurate solutions for in-distribution data (with average relative error less than 3.3%) and out-of-distribution data (average relative error less than 7.8%) together with precise text descriptions (with correct descriptions generated 100% of times). In certain tests, the model is also shown to be capable of extrapolating solutions in time.


BCAT: A Block Causal Transformer for PDE Foundation Models for Fluid Dynamics

arXiv.org Artificial Intelligence

We introduce BCAT, a PDE foundation model designed for autoregressive prediction of solutions to two dimensional fluid dynamics problems. Our approach uses a block causal transformer architecture to model next frame predictions, leveraging previous frames as contextual priors rather than relying solely on sub-frames or pixel-based inputs commonly used in image generation methods. This block causal framework more effectively captures the spatial dependencies inherent in nonlinear spatiotemporal dynamics and physical phenomena. In an ablation study, next frame prediction demonstrated a 2.9x accuracy improvement over next token prediction. BCAT is trained on a diverse range of fluid dynamics datasets, including incompressible and compressible Navier-Stokes equations across various geometries and parameter regimes, as well as the shallow-water equations. The model's performance was evaluated on 6 distinct downstream prediction tasks and tested on about 8K trajectories to measure robustness on a variety of fluid dynamics simulations. BCAT achieved an average relative error of 1.92% across all evaluation tasks, outperforming prior approaches on standard benchmarks.


VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

arXiv.org Artificial Intelligence

In-Context Operator Networks (ICONs) are models that learn operators across different types of PDEs using a few-shot, in-context approach. Although they show successful generalization to various PDEs, existing methods treat each data point as a single token, and suffer from computational inefficiency when processing dense data, limiting their application in higher spatial dimensions. In this work, we propose Vision In-Context Operator Networks (VICON), incorporating a vision transformer architecture that efficiently processes 2D functions through patch-wise operations. We evaluated our method on three fluid dynamics datasets, demonstrating both superior performance (reducing scaled $L^2$ error by $40\%$ and $61.6\%$ for two benchmark datasets for compressible flows, respectively) and computational efficiency (requiring only one-third of the inference time per frame) in long-term rollout predictions compared to the current state-of-the-art sequence-to-sequence model with fixed timestep prediction: Multiple Physics Pretraining (MPP). Compared to MPP, our method preserves the benefits of in-context operator learning, enabling flexible context formation when dealing with insufficient frame counts or varying timestep values.


DeepONet as a Multi-Operator Extrapolation Model: Distributed Pretraining with Physics-Informed Fine-Tuning

arXiv.org Artificial Intelligence

We propose a novel fine-tuning method to achieve multi-operator learning through training a distributed neural operator with diverse function data and then zero-shot fine-tuning the neural network using physics-informed losses for downstream tasks. Operator learning effectively approximates solution operators for PDEs and various PDE-related problems, yet it often struggles to generalize to new tasks. To address this, we investigate fine-tuning a pretrained model, while carefully selecting an initialization that enables rapid adaptation to new tasks with minimal data. Our approach combines distributed learning to integrate data from various operators in pre-training, while physics-informed methods enable zero-shot fine-tuning, minimizing the reliance on downstream data. We investigate standard fine-tuning and Low-Rank Adaptation fine-tuning, applying both to train complex nonlinear target operators that are difficult to learn only using random initialization. Through comprehensive numerical examples, we demonstrate the advantages of our approach, showcasing significant improvements in accuracy. Our findings provide a robust framework for advancing multi-operator learning and highlight the potential of transfer learning techniques in this domain.


Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

arXiv.org Machine Learning

Neural scaling laws play a pivotal role in the performance of deep neural networks and have been observed in a wide range of tasks. However, a complete theoretical framework for understanding these scaling laws remains underdeveloped. In this paper, we explore the neural scaling laws for deep operator networks, which involve learning mappings between function spaces, with a focus on the Chen and Chen style architecture. These approaches, which include the popular Deep Operator Network (DeepONet), approximate the output functions using a linear combination of learnable basis functions and coefficients that depend on the input functions. We establish a theoretical framework to quantify the neural scaling laws by analyzing its approximation and generalization errors. We articulate the relationship between the approximation and generalization errors of deep operator networks and key factors such as network model size and training data size. Moreover, we address cases where input functions exhibit low-dimensional structures, allowing us to derive tighter error bounds. These results also hold for deep ReLU networks and other similar structures. Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.


Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation

arXiv.org Artificial Intelligence

Foundation models, such as large language models, have demonstrated success in addressing various language and image processing tasks. In this work, we introduce a multi-modal foundation model for scientific problems, named PROSE-PDE. Our model, designed for bi-modality to bi-modality learning, is a multi-operator learning approach which can predict future states of spatiotemporal systems while concurrently learning the underlying governing equations of the physical system. Specifically, we focus on multi-operator learning by training distinct one-dimensional time-dependent nonlinear constant coefficient partial differential equations, with potential applications to many physical applications including physics, geology, and biology. More importantly, we provide three extrapolation studies to demonstrate that PROSE-PDE can generalize physical features through the robust training of multiple operators and that the proposed model can extrapolate to predict PDE solutions whose models or data were unseen during the training. Furthermore, we show through systematic numerical experiments that the utilization of the symbolic modality in our model effectively resolves the well-posedness problems with training multiple operators and thus enhances our model's predictive capabilities.


D2NO: Efficient Handling of Heterogeneous Input Function Spaces with Distributed Deep Neural Operators

arXiv.org Artificial Intelligence

Neural operators have been applied in various scientific fields, such as solving parametric partial differential equations, dynamical systems with control, and inverse problems. However, challenges arise when dealing with input functions that exhibit heterogeneous properties, requiring multiple sensors to handle functions with minimal regularity. To address this issue, discretization-invariant neural operators have been used, allowing the sampling of diverse input functions with different sensor locations. However, existing frameworks still require an equal number of sensors for all functions. In our study, we propose a novel distributed approach to further relax the discretization requirements and solve the heterogeneous dataset challenges. Our method involves partitioning the input function space and processing individual input functions using independent and separate neural networks. A centralized neural network is used to handle shared information across all output functions. This distributed methodology reduces the number of gradient descent back-propagation steps, improving efficiency while maintaining accuracy. We demonstrate that the corresponding neural network is a universal approximator of continuous nonlinear operators and present four numerical examples to validate its performance.


PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers

arXiv.org Artificial Intelligence

Approximating nonlinear differential equations using a neural network provides a robust and efficient tool for various scientific computing tasks, including real-time predictions, inverse problems, optimal controls, and surrogate modeling. Previous works have focused on embedding dynamical systems into networks through two approaches: learning a single solution operator (i.e., the mapping from input parametrized functions to solutions) or learning the governing system of equations (i.e., the constitutive model relative to the state variables). Both of these approaches yield different representations for the same underlying data or function. Additionally, observing that families of differential equations often share key characteristics, we seek one network representation across a wide range of equations. Our method, called Predicting Operators and Symbolic Expressions (PROSE), learns maps from multimodal inputs to multimodal outputs, capable of generating both numerical predictions and mathematical equations. By using a transformer structure and a feature fusion approach, our network can simultaneously embed sets of solution operators for various parametric differential equations using a single trained network. Detailed experiments demonstrate that the network benefits from its multimodal nature, resulting in improved prediction accuracy and better generalization. The network is shown to be able to handle noise in the data and errors in the symbolic representation, including noisy numerical values, model misspecification, and erroneous addition or deletion of terms. PROSE provides a new neural network framework for differential equations which allows for more flexibility and generality in learning operators and governing equations from data.


SRMD: Sparse Random Mode Decomposition

arXiv.org Machine Learning

Time-frequency analysis is an important tool for analyzing and leveraging information from signals. Various time-frequency approaches lead to a particular decomposition of signals into important time-varying features which can illuminate intrinsic behaviors, be used for analytics, or assist in smoothing the signal. For example, the classical short-time Fourier transform (STFT) extracts a time-varying representation by localizing the signal in time using a finite window length and applying the Fourier transform to each localized segment. Since the window length is fixed, the resulting analysis leads to a uniform time-frequency resolution. The continuous wavelet transform (CWT) constructs a representation of the signal using a variable time window by scaling the mother wavelet and thus leads to a multiscale representation of the signal.