Goto

Collaborating Authors

 Toscano, Juan Diego


KKANs: Kurkova-Kolmogorov-Arnold Networks and Their Learning Dynamics

arXiv.org Machine Learning

Inspired by the Kolmogorov-Arnold representation theorem and Kurkova's principle of using approximate representations, we propose the Kurkova-Kolmogorov-Arnold Network (KKAN), a new two-block architecture that combines robust multi-layer perceptron (MLP) based inner functions with flexible linear combinations of basis functions as outer functions. We first prove that KKAN is a universal approximator, and then we demonstrate its versatility across scientific machine-learning applications, including function regression, physics-informed machine learning (PIML), and operator-learning frameworks. The benchmark results show that KKANs outperform MLPs and the original Kolmogorov-Arnold Networks (KANs) in function approximation and operator learning tasks and achieve performance comparable to fully optimized MLPs for PIML. To better understand the behavior of the new representation models, we analyze their geometric complexity and learning dynamics using information bottleneck theory, identifying three universal learning stages, fitting, transition, and diffusion, across all types of architectures. We find a strong correlation between geometric complexity and signal-to-noise ratio (SNR), with optimal generalization achieved during the diffusion stage. Additionally, we propose self-scaled residual-based attention weights to maintain high SNR dynamically, ensuring uniform convergence and prolonged learning.


From PINNs to PIKANs: Recent Advances in Physics-Informed Machine Learning

arXiv.org Artificial Intelligence

Physics-Informed Neural Networks (PINNs) have emerged as a key tool in Scientific Machine Learning since their introduction in 2017, enabling the efficient solution of ordinary and partial differential equations using sparse measurements. Over the past few years, significant advancements have been made in the training and optimization of PINNs, covering aspects such as network architectures, adaptive refinement, domain decomposition, and the use of adaptive weights and activation functions. A notable recent development is the Physics-Informed Kolmogorov-Arnold Networks (PIKANS), which leverage a representation model originally proposed by Kolmogorov in 1957, offering a promising alternative to traditional PINNs. In this review, we provide a comprehensive overview of the latest advancements in PINNs, focusing on improvements in network design, feature expansion, optimization techniques, uncertainty quantification, and theoretical insights. We also survey key applications across a range of fields, including biomedicine, fluid and solid mechanics, geophysics, dynamical systems, heat transfer, chemical engineering, and beyond. Finally, we review computational frameworks and software tools developed by both academia and industry to support PINN research and applications.


A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks

arXiv.org Artificial Intelligence

Kolmogorov-Arnold Networks (KANs) were recently introduced as an alternative representation model to MLP. Herein, we employ KANs to construct physics-informed machine learning models (PIKANs) and deep operator models (DeepOKANs) for solving differential equations for forward and inverse problems. In particular, we compare them with physics-informed neural networks (PINNs) and deep operator networks (DeepONets), which are based on the standard MLP representation. We find that although the original KANs based on the B-splines parameterization lack accuracy and efficiency, modified versions based on low-order orthogonal polynomials have comparable performance to PINNs and DeepONet although they still lack robustness as they may diverge for different random seeds or higher order orthogonal polynomials. We visualize their corresponding loss landscapes and analyze their learning dynamics using information bottleneck theory. Our study follows the FAIR principles so that other researchers can use our benchmarks to further advance this emerging topic.


Learning in PINNs: Phase transition, total diffusion, and generalization

arXiv.org Artificial Intelligence

Phase transitions in deep learning The optimization process in deep learning can vary significantly in terms of smoothness and convergence rate, depending on various factors such as the complexity of the model, the quality/quantity of the data or the loss landscape characteristics. However, for non-convex problems this process has often been observed to be far from smooth and steady; instead it is rather dominated by discrete, successive phases. Recent studies have shed light on several key aspects influencing these phases and the overall optimization dynamics [1-10]. Figure 1: Phase transition in PINNs: The test error between the prediction and the exact solution converges faster after total diffusion (dashed lines), which occurs with an abrupt phase transition defined by homogeneous residuals. Although the convergence starts during the onset of the diffusion phase, the optimal training performance is met when the gradients of different batches become equivalent, indicating a general agreement on the direction of the optimizer steps (total diffusion). The importance of gradient noise in escaping local optima of non-convex optimization has been explored, demonstrating its role in guaranteeing polynomial time convergence to a global optimum [1]. The authors of the same work suggest the existence of a phase transition for a perturbed gradient descent GD algorithm, from escaping local optima to converging to a global solution as the artificial noise decreases. In a later work, a phenomenon called "super-convergence" has been highlighted, where models trained with a two-phase cyclical learning rate may lead to improved regularization balance and generalization [2]. Furthermore, recent investigations have discovered a two-phase learning regime for full-batch gradient descent (GD), characterized by distinct behaviors [3].


Residual-based attention and connection to information bottleneck theory in PINNs

arXiv.org Artificial Intelligence

Driven by the need for more efficient and seamless integration of physical models and data, physics-informed neural networks (PINNs) have seen a surge of interest in recent years. However, ensuring the reliability of their convergence and accuracy remains a challenge. In this work, we propose an efficient, gradient-less weighting scheme for PINNs, that accelerates the convergence of dynamic or static systems. This simple yet effective attention mechanism is a function of the evolving cumulative residuals and aims to make the optimizer aware of problematic regions at no extra computational cost or adversarial learning. We illustrate that this general method consistently achieves a relative $L^{2}$ error of the order of $10^{-5}$ using standard optimizers on typical benchmark cases of the literature. Furthermore, by investigating the evolution of weights during training, we identify two distinct learning phases reminiscent of the fitting and diffusion phases proposed by the information bottleneck (IB) theory. Subsequent gradient analysis supports this hypothesis by aligning the transition from high to low signal-to-noise ratio (SNR) with the transition from fitting to diffusion regimes of the adopted weights. This novel correlation between PINNs and IB theory could open future possibilities for understanding the underlying mechanisms behind the training and stability of PINNs and, more broadly, of neural operators.