Braga-Neto, Ulisses
DeepOSets: Non-Autoregressive In-Context Learning of Supervised Learning Operators
Chiu, Shao-Ting, Hong, Junyuan, Braga-Neto, Ulisses
We introduce DeepSets Operator Networks (DeepOSets), an efficient, non-autoregressive neural network architecture for in-context operator learning. In-context learning allows a trained machine learning model to learn from a user prompt without further training. DeepOSets adds in-context learning capabilities to Deep Operator Networks (DeepONets) by combining it with the DeepSets architecture. As the first non-autoregressive model for in-context operator learning, DeepOSets allow the user prompt to be processed in parallel, leading to significant computational savings. Here, we present the application of DeepOSets in the problem of learning supervised learning algorithms, which are operators mapping a finite-dimensional space of labeled data into an infinite-dimensional hypothesis space of prediction functions. In an empirical comparison with a popular autoregressive (transformer-based) model for in-context learning of linear regression in one and five dimensions, DeepOSets reduced the number of model weights by several orders of magnitude and required a fraction of training and inference time. Furthermore, DeepOSets proved to be less sensitive to noise, significantly outperforming the transformer model in noisy settings.
Generalized Resubstitution for Regression Error Estimation
Marcondes, Diego, Braga-Neto, Ulisses
We propose generalized resubstitution error estimators for regression, a broad family of estimators, each corresponding to a choice of empirical probability measures and loss function. The usual sum of squares criterion is a special case corresponding to the standard empirical probability measure and the quadratic loss. Other choices of empirical probability measure lead to more general estimators with superior bias and variance properties. We prove that these error estimators are consistent under broad assumptions. In addition, procedures for choosing the empirical measure based on the method of moments and maximum pseudo-likelihood are proposed and investigated. Detailed experimental results using polynomial regression demonstrate empirically the superior finite-sample bias and variance properties of the proposed estimators. The R code for the experiments is provided.
Label Propagation Training Schemes for Physics-Informed Neural Networks and Gaussian Processes
Zhong, Ming, Liu, Dehao, Arroyave, Raymundo, Braga-Neto, Ulisses
This paper proposes a semi-supervised methodology for training physics-informed machine learning methods. This includes self-training of physics-informed neural networks and physics-informed Gaussian processes in isolation, and the integration of the two via co-training. We demonstrate via extensive numerical experiments how these methods can ameliorate the issue of propagating information forward in time, which is a common failure mode of physics-informed machine learning.
Auto-PINN: Understanding and Optimizing Physics-Informed Neural Architecture
Wang, Yicheng, Han, Xiaotian, Chang, Chia-Yuan, Zha, Daochen, Braga-Neto, Ulisses, Hu, Xia
Physics-informed neural networks (PINNs) are revolutionizing science and engineering practice by bringing together the power of deep learning to bear on scientific computation. In forward modeling problems, PINNs are meshless partial differential equation (PDE) solvers that can handle irregular, high-dimensional physical domains. Naturally, the neural architecture hyperparameters have a large impact on the efficiency and accuracy of the PINN solver. However, this remains an open and challenging problem because of the large search space and the difficulty of identifying a proper search objective for PDEs. Here, we propose Auto-PINN, the first systematic, automated hyperparameter optimization approach for PINNs, which employs Neural Architecture Search (NAS) techniques to PINN design. Auto-PINN avoids manually or exhaustively searching the hyperparameter space associated with PINNs. A comprehensive set of pre-experiments using standard PDE benchmarks allows us to probe the structure-performance relationship in PINNs. We find that the different hyperparameters can be decoupled, and that the training loss function of PINNs is a good search objective. Comparison experiments with baseline methods demonstrate that Auto-PINN produces neural architectures with superior stability and accuracy over alternative baselines.
Characteristics-Informed Neural Networks for Forward and Inverse Hyperbolic Problems
Braga-Neto, Ulisses
We propose characteristics-informed neural networks (CINN), a simple and efficient machine learning approach for solving forward and inverse problems involving hyperbolic PDEs. Like physics-informed neural networks (PINN), CINN is a meshless machine learning solver with universal approximation capabilities. Unlike PINN, which enforces a PDE softly via a multi-part loss function, CINN encodes the characteristics of the PDE in a general-purpose deep neural network by adding a characteristic layer. This neural network is trained with the usual MSE data-fitting regression loss and does not require residual losses on collocation points. This leads to faster training and can avoid well-known pathologies of gradient descent optimization of multi-part PINN loss functions. This paper focuses on linear transport phenomena, in which case it is shown that, if the characteristic ODEs can be solved exactly, then the output of a CINN is an exact solution of the PDE, even at initialization, preventing the occurrence of non-physical solutions. In addition, a CINN can also be trained with soft penalty constraints that enforce, for example, periodic or Neumman boundary conditions, without losing the property that the output satisfies the PDE automatically. We also propose an architecture that extends the CINN approach to linear hyperbolic systems of PDEs. All CINN architectures proposed here can be trained end-to-end from sample data using standard deep learning software. Experiments with the simple advection equation, a stiff periodic advection equation, and an acoustics problem where data from one field is used to predict the other, unseen field, indicate that CINN is able to improve on the accuracy of the baseline PINN, in some cases by a considerable margin, while also being significantly faster to train and avoiding non-physical solutions. An extension to nonlinear PDEs is also briefly discussed.
Generalized Resubstitution for Classification Error Estimation
Ghane, Parisa, Braga-Neto, Ulisses
We propose the family of generalized resubstitution classifier error estimators based on empirical measures. These error estimators are computationally efficient and do not require re-training of classifiers. The plain resubstitution error estimator corresponds to choosing the standard empirical measure. Other choices of empirical measure lead to bolstered, posterior-probability, Gaussian-process, and Bayesian error estimators; in addition, we propose bolstered posterior-probability error estimators as a new family of generalized resubstitution estimators. In the two-class case, we show that a generalized resubstitution estimator is consistent and asymptotically unbiased, regardless of the distribution of the features and label, if the corresponding generalized empirical measure converges uniformly to the standard empirical measure and the classification rule has a finite VC dimension. A generalized resubstitution estimator typically has hyperparameters that can be tuned to control its bias and variance, which adds flexibility. Numerical experiments with various classification rules trained on synthetic data assess the thefinite-sample performance of several representative generalized resubstitution error estimators. In addition, results of an image classification experiment using the LeNet-5 convolutional neural network and the MNIST data set demonstrate the potential of this class of error estimators in deep learning for computer vision.
Self-Adaptive Physics-Informed Neural Networks using a Soft Attention Mechanism
McClenny, Levi, Braga-Neto, Ulisses
Physics-Informed Neural Networks (PINNs) have emerged recently as a promising application of deep neural networks to the numerical solution of nonlinear partial differential equations (PDEs). However, the original PINN algorithm is known to suffer from stability and accuracy problems in cases where the solution has sharp spatio-temporal transitions. These stiff PDEs require an unreasonably large number of collocation points to be solved accurately. It has been recognized that adaptive procedures are needed to force the neural network to fit accurately the stubborn spots in the solution of stiff PDEs. To accomplish this, previous approaches have used fixed weights hard-coded over regions of the solution deemed to be important. In this paper, we propose a fundamentally new method to train PINNs adaptively, where the adaptation weights are fully trainable, so the neural network learns by itself which regions of the solution are difficult and is forced to focus on them, which is reminiscent of soft multiplicative-mask attention mechanism used in computer vision. The basic idea behind these Self-Adaptive PINNs is to make the weights increase where the corresponding loss is higher, which is accomplished by training the network to simultaneously minimize the losses and maximize the weights, i.e., to find a saddle point in the cost surface. We show that this is formally equivalent to solving a PDE-constrained optimization problem using a penalty-based method, though in a way where the monotonically-nondecreasing penalty coefficients are trainable. Numerical experiments with an Allen-Cahn stiff PDE, the Self-Adaptive PINN outperformed other state-of-the-art PINN algorithms in L2 error by a wide margin, while using a smaller number of training epochs. An Appendix contains additional results with Burger's and Helmholtz PDEs, which confirmed the trends observed in the Allen-Cahn experiments.
A Semi-Supervised Generative Adversarial Network for Prediction of Genetic Disease Outcomes
Davi, Caio, Braga-Neto, Ulisses
For most diseases, building large databases of labeled genetic data is an expensive and time-demanding task. To address this, we introduce genetic Generative Adversarial Networks (gGAN), a semi-supervised approach based on an innovative GAN architecture to create large synthetic genetic data sets starting with a small amount of labeled data and a large amount of unlabeled data. Our goal is to determine the propensity of a new individual to develop the severe form of the illness from their genetic profile alone. The proposed model achieved satisfactory results using real genetic data from different datasets and populations, in which the test populations may not have the same genetic profiles. The proposed model is self-aware and capable of determining whether a new genetic profile has enough compatibility with the data on which the network was trained and is thus suitable for prediction. The code and datasets used can be found at https://github.com/caio-davi/gGAN.
Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs
Imani, Mahdi, Braga-Neto, Ulisses
This paper is concerned with the problem of stochastic control of gene regulatory networks (GRNs) observed indirectly through noisy measurements and with uncertainty in the intervention inputs. The partial observability of the gene states and uncertainty in the intervention process are accounted for by modeling GRNs using the partially-observed Boolean dynamical system (POBDS) signal model with noisy gene expression measurements. Obtaining the optimal infinite-horizon control strategy for this problem is not attainable in general, and we apply reinforcement learning and Gaussian process techniques to find a near-optimal solution. The POBDS is first transformed to a directly-observed Markov Decision Process in a continuous belief space, and the Gaussian process is used for modeling the cost function over the belief and intervention spaces. Reinforcement learning then is used to learn the cost function from the available gene expression data. In addition, we employ sparsification, which enables the control of large partially-observed GRNs. The performance of the resulting algorithm is studied through a comprehensive set of numerical experiments using synthetic gene expression data generated from a melanoma gene regulatory network.