Goto

Collaborating Authors

 Stinis, Panos


What do physics-informed DeepONets learn? Understanding and improving training for scientific computing applications

arXiv.org Artificial Intelligence

Physics-informed deep operator networks (DeepONets) have emerged as a promising approach toward numerically approximating the solution of partial differential equations (PDEs). In this work, we aim to develop further understanding of what is being learned by physics-informed DeepONets by assessing the universality of the extracted basis functions and demonstrating their potential toward model reduction with spectral methods. Results provide clarity about measuring the performance of a physics-informed DeepONet through the decays of singular values and expansion coefficients. In addition, we propose a transfer learning approach for improving training for physics-informed DeepONets between parameters of the same PDE as well as across different, but related, PDEs where these models struggle to train well. This approach results in significant error reduction and learned basis functions that are more effective in representing the solution of a PDE.


SPIKANs: Separable Physics-Informed Kolmogorov-Arnold Networks

arXiv.org Artificial Intelligence

Physics-Informed Neural Networks (PINNs) have emerged as a promising method for solving partial differential equations (PDEs) in scientific computing. While PINNs typically use multilayer perceptrons (MLPs) as their underlying architecture, recent advancements have explored alternative neural network structures. One such innovation is the Kolmogorov-Arnold Network (KAN), which has demonstrated benefits over traditional MLPs, including faster neural scaling and better interpretability. The application of KANs to physics-informed learning has led to the development of Physics-Informed KANs (PIKANs), enabling the use of KANs to solve PDEs. However, despite their advantages, KANs often suffer from slower training speeds, particularly in higher-dimensional problems where the number of collocation points grows exponentially with the dimensionality of the system. To address this challenge, we introduce Separable Physics-Informed Kolmogorov-Arnold Networks (SPIKANs). This novel architecture applies the principle of separation of variables to PIKANs, decomposing the problem such that each dimension is handled by an individual KAN. This approach drastically reduces the computational complexity of training without sacrificing accuracy, facilitating their application to higher-dimensional PDEs. Through a series of benchmark problems, we demonstrate the effectiveness of SPIKANs, showcasing their superior scalability and performance compared to PIKANs and highlighting their potential for solving complex, high-dimensional PDEs in scientific computing.


Multifidelity Kolmogorov-Arnold Networks

arXiv.org Artificial Intelligence

In recent years, scientific machine learning (SciML) has emerged as a paradigm for modeling physical systems [1, 2, 3]. Typically using the theory of multilayer perceptrons (MLPs), SciML has shown great success in modeling a wide range of applications, however, data-informed training struggles when high-quality data is not available. Kolmogorov-Arnold networks (KANs) have recently been developed as an alternative to MLPs [4, 5]. KANs use the Kolmogorov-Arnold Theorem as inspiration and can offer advantages over MLPs in some cases, such as for discovering interpretable models. However, KANs have been shown to struggle to reach the accuracy of MLPs, particularly without modifications [6, 7, 8, 9]. In the short time since the publication of [4], many variations of KANs have been developed, including physics-informed KANs (PIKANs)[9], KAN-informed neural networks (KINNs)[10], temporal KANs [11], wavelet KANs [12], graph KANs [13, 14, 15], Chebyshev KANs (cKANs) [16], convolutional KANs [17], ReLU-KANs [18], Higher-order-ReLU-KANs (HRKANs) [19], fractional KANs [20], finite basis KANs [21], deep operator KANs [22], and others.


Finite basis Kolmogorov-Arnold networks: domain decomposition for data-driven and physics-informed problems

arXiv.org Artificial Intelligence

Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to train, even for relatively small networks. Inspired by finite basis physics-informed neural networks (FBPINNs), in this work, we develop a domain decomposition method for KANs that allows for several small KANs to be trained in parallel to give accurate solutions for multiscale problems. We show that finite basis KANs (FBKANs) can provide accurate results with noisy data and for physics-informed training.


Self-adaptive weights based on balanced residual decay rate for physics-informed neural networks and deep operator networks

arXiv.org Machine Learning

Physics-informed deep learning has emerged as a promising alternative for solving partial differential equations. However, for complex problems, training these networks can still be challenging, often resulting in unsatisfactory accuracy and efficiency. In this work, we demonstrate that the failure of plain physics-informed neural networks arises from the significant discrepancy in the convergence speed of residuals at different training points, where the slowest convergence speed dominates the overall solution convergence. Based on these observations, we propose a point-wise adaptive weighting method that balances the residual decay rate across different training points. The performance of our proposed adaptive weighting method is compared with current state-of-the-art adaptive weighting methods on benchmark problems for both physics-informed neural networks and physics-informed deep operator networks. Through extensive numerical results we demonstrate that our proposed approach of balanced residual decay rates offers several advantages, including bounded weights, high prediction accuracy, fast convergence speed, low training uncertainty, low computational cost and ease of hyperparameter tuning.


Efficient kernel surrogates for neural network-based regression

arXiv.org Artificial Intelligence

Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to study their generalization properties on unseen datasets. Recent work has shown that randomly initialized DNNs in the infinite width limit converge to kernel machines relying on a Neural Tangent Kernel (NTK) with known closed form. These results suggest, and experimental evidence corroborates, that empirical kernel machines can also act as surrogates for finite width DNNs. The high computational cost of assembling the full NTK, however, makes this approach infeasible in practice, motivating the need for low-cost approximations. In the current work, we study the performance of the Conjugate Kernel (CK), an efficient approximation to the NTK that has been observed to yield fairly similar results. For the regression problem of smooth functions and logistic regression classification, we show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior. In particular, we establish bounds for the relative test losses, verify them with numerical tests, and identify the regularity of the kernel as the key determinant of performance. In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively. We present a demonstration of this on the foundation model GPT-2 by comparing its performance on a classification task using a conventional approach and our prescription. We also show how our approach can be used to improve physics-informed operator network training for regression tasks as well as convolutional neural network training for vision classification tasks.


Multifidelity domain decomposition-based physics-informed neural networks for time-dependent problems

arXiv.org Artificial Intelligence

Multiscale problems are challenging for neural network-based discretizations of differential equations, such as physics-informed neural networks (PINNs). This can be (partly) attributed to the so-called spectral bias of neural networks. To improve the performance of PINNs for time-dependent problems, a combination of multifidelity stacking PINNs and domain decomposition-based finite basis PINNs are employed. In particular, to learn the high-fidelity part of the multifidelity model, a domain decomposition in time is employed. The performance is investigated for a pendulum and a two-frequency problem as well as the Allen-Cahn equation. It can be observed that the domain decomposition approach clearly improves the PINN and stacking PINN approaches.


Multifidelity Deep Operator Networks For Data-Driven and Physics-Informed Problems

arXiv.org Artificial Intelligence

In general, low-fidelity data is easier to obtain in greater quantities, but it may be too inaccurate or not dense enough to accurately train a machine learning model. High-fidelity data is costly to obtain, so there may not be sufficient data to use in training, however, it is more accurate. A small amount of high fidelity data, such as from measurements, combined with low fidelity data, can improve predictions when used together; this has motivated geophysicists to develop cokriging [1], which is based on Gaussian process regression at two different fidelity levels by exploiting correlations-albeit only linear ones - between different levels. An example of cokriging for obtaining the sea surface temperature (as well as the associated uncertainty) is presented in [2], where satellite images are used as low-fidelity data whereas in situ measurements are used as high-fidelity data. To exploit nonlinear correlations at different levels of fidelity, a probabilistic framework based on Gaussian process regression and nonlinear autoregressive scheme was proposed in [3] that can learn complex nonlinear and space-dependent cross-correlations between multifidelity models. However, the limitation of this work is the high computational cost for big data sets, and to this end, the subsequent work in [4] was based on neural networks and provided the first method of multifidelity training of deep neural networks.


Stacked networks improve physics-informed training: applications to neural networks and deep operator networks

arXiv.org Artificial Intelligence

Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build a chain of networks, where the output at one step can act as a low-fidelity input for training the next step, gradually increasing the expressivity of the learned model. The equations imposed at each step of the iterative process can be the same or different (akin to simulated annealing). The iterative (stacking) nature of the proposed method allows us to progressively learn features of a solution that are hard to learn directly. Through benchmark problems including a nonlinear pendulum, the wave equation, and the viscous Burgers equation, we show how stacking can be used to improve the accuracy and reduce the required size of physics-informed neural networks and operator networks.


Exploring Learned Representations of Neural Networks with Principal Component Analysis

arXiv.org Artificial Intelligence

Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We show that in certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification and that across all layers, the first ~100 PCs completely determine the performance of the k-NN and NCC classifiers. We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse. Our preliminary work provides three distinct yet interpretable surrogate models for feature representation with an affine linear model the best performing. We also show that leveraging several surrogate models affords us a clever method to estimate where neural collapse may initially occur within the DNN.