Plotting

 Mishra, Siddhartha


A Survey on Oversmoothing in Graph Neural Networks

arXiv.org Artificial Intelligence

Node features of graph neural networks (GNNs) tend to become more similar with the increase of the network depth. This effect is known as over-smoothing, which we axiomatically define as the exponential convergence of suitable similarity measures on the node features. Our definition unifies previous approaches and gives rise to new quantitative measures of over-smoothing. Moreover, we empirically demonstrate this behavior for several over-smoothing measures on different graphs (small-, medium-, and large-scale). We also review several approaches for mitigating over-smoothing and empirically test their effectiveness on real-world graph datasets. Through illustrative examples, we demonstrate that mitigating over-smoothing is a necessary but not sufficient condition for building deep GNNs that are expressive on a wide range of graph learning tasks. Finally, we extend our definition of over-smoothing to the rapidly emerging field of continuous-time GNNs.


Gradient Gating for Deep Multi-Rate Learning on Graphs

arXiv.org Artificial Intelligence

We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any basic GNN layer as a wrapper around which the multi-rate gradient gating mechanism is built. We rigorously prove that G$^2$ alleviates the oversmoothing problem and allows the design of deep GNNs. Empirical results are presented to demonstrate that the proposed framework achieves state-of-the-art performance on a variety of graph learning tasks, including on large-scale heterophilic graphs.


Multi-Scale Message Passing Neural PDE Solvers

arXiv.org Artificial Intelligence

We propose a novel multi-scale message passing neural network algorithm for learning the solutions of time-dependent PDEs. Our algorithm possesses both temporal and spatial multi-scale resolution features by incorporating multi-scale sequence models and graph gating modules in the encoder and processor, respectively. Benchmark numerical experiments are presented to demonstrate that the proposed algorithm outperforms baselines, particularly on a PDE with a range of spatial and temporal scales.


Error estimates for physics informed neural networks approximating the Navier-Stokes equations

arXiv.org Artificial Intelligence

We prove rigorous bounds on the errors resulting from the approximation of the incompressible Navier-Stokes equations with (extended) physics-informed neural networks. We show that the underlying PDE residual can be made arbitrarily small for tanh neural networks with two hidden layers. Moreover, the total error can be estimated in terms of the training error, network size and number of quadrature points. The theory is illustrated with numerical experiments.


wPINNs: Weak Physics informed neural networks for approximating entropy solutions of hyperbolic conservation laws

arXiv.org Artificial Intelligence

Physics informed neural networks (PINNs) require regularity of solutions of the underlying PDE to guarantee accurate approximation. Consequently, they may fail at approximating discontinuous solutions of PDEs such as nonlinear hyperbolic equations. To ameliorate this, we propose a novel variant of PINNs, termed as weak PINNs (wPINNs) for accurate approximation of entropy solutions of scalar conservation laws. wPINNs are based on approximating the solution of a min-max optimization problem for a residual, defined in terms of Kruzkhov entropies, to determine parameters for the neural networks approximating the entropy solution as well as test functions. We prove rigorous bounds on the error incurred by wPINNs and illustrate their performance through numerical experiments to demonstrate that wPINNs can approximate entropy solutions accurately.


Error analysis for deep neural network approximations of parametric hyperbolic conservation laws

arXiv.org Artificial Intelligence

We derive rigorous bounds on the error resulting from the approximation of the solution of parametric hyperbolic scalar conservation laws with ReLU neural networks. We show that the approximation error can be made as small as desired with ReLU neural networks that overcome the curse of dimensionality. In addition, we provide an explicit upper bound on the generalization error in terms of the training error, number of training samples and the neural network size. The theoretical results are illustrated by numerical experiments.


Graph-Coupled Oscillator Networks

arXiv.org Machine Learning

These models have recently been successfully applied in a variety of tasks such as computer vision and graphics Monti et al. (2017), recommender systems Ying et al. (2018), transportation Derrow-Pinion et al. (2021), computational chemistry (Gilmer et al., 2017), drug discovery Gaudelet et al. (2021), physics (Shlomi et al., 2020), and analysis of social networks (see Zhou et al. (2019); Bronstein et al. (2021) for additional applications). Several recent works proposed Graph ML models based on differential equations coming from physics Avelar et al. (2019); Poli et al. (2019b); Zhuang et al. (2020); Xhonneux et al. (2020b), including diffusion Chamberlain et al. (2021b) and wave Eliasof et al. (2021) equations and geometric equations such as Beltrami Chamberlain et al. (2021a) and Ricci Topping et al. (2021) flows. Such approaches allow not only to recover popular GNN models as discretization schemes for the underling differential equations, but also, in some cases, can address problems encountered in traditional GNNs such as oversmoothing Nt & Maehara (2019); Oono & Suzuki (2020) and bottlenecks Alon & Yahav (2021). In this paper, we propose a novel physically-inspired approach to learning on graphs. Our framework, termed GraphCON (Graph-Coupled Oscillator Network) builds upon suitable time-discretizations of a specific class of ordinary differential equations (ODEs) that model the dyanmics of a network of non-linear forced and damped oscillators, which are coupled via the adjacency structure of the underlying graph. Graph-coupled oscillators are often encountered in mechanical, electronic, and biological systems, and have been studied extensively Strogatz (2015), with a prominent example being functional circuits in the brain such as cortical columns Stiefel & Ermentrout (2016).


Long Expressive Memory for Sequence Modeling

arXiv.org Machine Learning

Learning tasks with sequential data as inputs (and possibly outputs) arise in a wide variety of contexts, including computer vision, text and speech recognition, natural language processing, and time series analysis in the sciences and engineering. While recurrent gradient-based models have been successfully used in processing sequential data sets, it is well-known that training these models to process (very) long sequential inputs is extremely challenging on account of the so-called exploding and vanishing gradients problem [32]. This arises as calculating hidden state gradients entails the computation of an iterative product of gradients over a large number of steps. Consequently, this (long) product can easily grow or decay exponentially in the number of recurrent interactions. Mitigation of the exploding and vanishing gradients problem has received considerable attention in the literature. A classical approach, used in Long Short-Term Memory (LSTM) [18] and Gated Recurrent Units (GRUs) [11], relies on gating mechanisms and leverages the resulting additive structure to ensure that gradients do not vanish.


UnICORNN: A recurrent model for learning very long time dependencies

arXiv.org Machine Learning

The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long time-dependencies.


Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies

arXiv.org Machine Learning

Circuits of biological neurons, such as in the functional parts of the brain can be modeled as networks of coupled oscillators. Inspired by the ability of these systems to express a rich set of outputs while keeping (gradients of) state variables bounded, we propose a novel architecture for recurrent neural networks. Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations, modeling networks of controlled nonlinear oscillators. We prove precise bounds on the gradients of the hidden states, leading to the mitigation of the exploding and vanishing gradient problem for this RNN. Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks, demonstrating the potential of this architecture to provide stable and accurate RNNs for processing complex sequential data.