AITopics

2303.10993

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceMar-15-2023

Gradient Gating for Deep Multi-Rate Learning on Graphs

Rusch, T. Konstantin, Chamberlain, Benjamin P., Mahoney, Michael W., Bronstein, Michael M., Mishra, Siddhartha

We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any basic GNN layer as a wrapper around which the multi-rate gradient gating mechanism is built. We rigorously prove that G$^2$ alleviates the oversmoothing problem and allows the design of deep GNNs. Empirical results are presented to demonstrate that the proposed framework achieves state-of-the-art performance on a variety of graph learning tasks, including on large-scale heterophilic graphs.

artificial intelligence, graph, machine learning, (18 more...)

2210.00513

Country:

Europe (0.28)
North America > United States (0.15)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-7-2023

Multi-Scale Message Passing Neural PDE Solvers

Equer, Léonard, Rusch, T. Konstantin, Mishra, Siddhartha

We propose a novel multi-scale message passing neural network algorithm for learning the solutions of time-dependent PDEs. Our algorithm possesses both temporal and spatial multi-scale resolution features by incorporating multi-scale sequence models and graph gating modules in the encoder and processor, respectively. Benchmark numerical experiments are presented to demonstrate that the proposed algorithm outperforms baselines, particularly on a PDE with a range of spatial and temporal scales.

artificial intelligence, machine learning, neural network, (18 more...)

2302.0358

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

arXiv.org Artificial IntelligenceFeb-2-2023

Error estimates for physics informed neural networks approximating the Navier-Stokes equations

De Ryck, Tim, Jagtap, Ameya D., Mishra, Siddhartha

We prove rigorous bounds on the errors resulting from the approximation of the incompressible Navier-Stokes equations with (extended) physics-informed neural networks. We show that the underlying PDE residual can be made arbitrarily small for tanh neural networks with two hidden layers. Moreover, the total error can be estimated in terms of the training error, network size and number of quadrature points. The theory is illustrated with numerical experiments.

artificial intelligence, machine learning, neural network, (19 more...)

2203.09346

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-18-2022

wPINNs: Weak Physics informed neural networks for approximating entropy solutions of hyperbolic conservation laws

De Ryck, Tim, Mishra, Siddhartha, Molinaro, Roberto

Physics informed neural networks (PINNs) require regularity of solutions of the underlying PDE to guarantee accurate approximation. Consequently, they may fail at approximating discontinuous solutions of PDEs such as nonlinear hyperbolic equations. To ameliorate this, we propose a novel variant of PINNs, termed as weak PINNs (wPINNs) for accurate approximation of entropy solutions of scalar conservation laws. wPINNs are based on approximating the solution of a min-max optimization problem for a residual, defined in terms of Kruzkhov entropies, to determine parameters for the neural networks approximating the entropy solution as well as test functions. We prove rigorous bounds on the error incurred by wPINNs and illustrate their performance through numerical experiments to demonstrate that wPINNs can approximate entropy solutions accurately.

artificial intelligence, machine learning, neural network, (15 more...)

2207.08483

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-15-2022

Error analysis for deep neural network approximations of parametric hyperbolic conservation laws

De Ryck, Tim, Mishra, Siddhartha

We derive rigorous bounds on the error resulting from the approximation of the solution of parametric hyperbolic scalar conservation laws with ReLU neural networks. We show that the approximation error can be made as small as desired with ReLU neural networks that overcome the curse of dimensionality. In addition, we provide an explicit upper bound on the generalization error in terms of the training error, number of training samples and the neural network size. The theoretical results are illustrated by numerical experiments.

artificial intelligence, machine learning, neural network, (18 more...)

2207.07362

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningFeb-4-2022

Graph-Coupled Oscillator Networks

Rusch, T. Konstantin, Chamberlain, Benjamin P., Rowbottom, James, Mishra, Siddhartha, Bronstein, Michael M.

These models have recently been successfully applied in a variety of tasks such as computer vision and graphics Monti et al. (2017), recommender systems Ying et al. (2018), transportation Derrow-Pinion et al. (2021), computational chemistry (Gilmer et al., 2017), drug discovery Gaudelet et al. (2021), physics (Shlomi et al., 2020), and analysis of social networks (see Zhou et al. (2019); Bronstein et al. (2021) for additional applications). Several recent works proposed Graph ML models based on differential equations coming from physics Avelar et al. (2019); Poli et al. (2019b); Zhuang et al. (2020); Xhonneux et al. (2020b), including diffusion Chamberlain et al. (2021b) and wave Eliasof et al. (2021) equations and geometric equations such as Beltrami Chamberlain et al. (2021a) and Ricci Topping et al. (2021) flows. Such approaches allow not only to recover popular GNN models as discretization schemes for the underling differential equations, but also, in some cases, can address problems encountered in traditional GNNs such as oversmoothing Nt & Maehara (2019); Oono & Suzuki (2020) and bottlenecks Alon & Yahav (2021). In this paper, we propose a novel physically-inspired approach to learning on graphs. Our framework, termed GraphCON (Graph-Coupled Oscillator Network) builds upon suitable time-discretizations of a specific class of ordinary differential equations (ODEs) that model the dyanmics of a network of non-linear forced and damped oscillators, which are coupled via the adjacency structure of the underlying graph. Graph-coupled oscillators are often encountered in mechanical, electronic, and biological systems, and have been studied extensively Strogatz (2015), with a prominent example being functional circuits in the brain such as cortical columns Stiefel & Ermentrout (2016).

artificial intelligence, information technology services, machine learning, (21 more...)

2202.02296

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Services (0.48)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningOct-10-2021

Long Expressive Memory for Sequence Modeling

Rusch, T. Konstantin, Mishra, Siddhartha, Erichson, N. Benjamin, Mahoney, Michael W.

Learning tasks with sequential data as inputs (and possibly outputs) arise in a wide variety of contexts, including computer vision, text and speech recognition, natural language processing, and time series analysis in the sciences and engineering. While recurrent gradient-based models have been successfully used in processing sequential data sets, it is well-known that training these models to process (very) long sequential inputs is extremely challenging on account of the so-called exploding and vanishing gradients problem [32]. This arises as calculating hidden state gradients entails the computation of an iterative product of gradients over a large number of steps. Consequently, this (long) product can easily grow or decay exponentially in the number of recurrent interactions. Mitigation of the exploding and vanishing gradients problem has received considerable attention in the literature. A classical approach, used in Long Short-Term Memory (LSTM) [18] and Gated Recurrent Units (GRUs) [11], relies on gating mechanisms and leverages the resulting additive structure to ensure that gradients do not vanish.

artificial intelligence, health & medicine, machine learning, (18 more...)

2110.04744

Country:

North America > United States > California (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMar-9-2021

UnICORNN: A recurrent model for learning very long time dependencies

Rusch, T. Konstantin, Mishra, Siddhartha

The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long time-dependencies.

deep learning, neural network, unicornn, (20 more...)

2103.05487

Country:

Africa > Ethiopia (0.14)
North America > United States (0.14)
Europe > Switzerland (0.14)
North America > Canada (0.14)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Diagnostic Medicine (0.68)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningOct-2-2020

Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies

Rusch, T. Konstantin, Mishra, Siddhartha

Circuits of biological neurons, such as in the functional parts of the brain can be modeled as networks of coupled oscillators. Inspired by the ability of these systems to express a rich set of outputs while keeping (gradients of) state variables bounded, we propose a novel architecture for recurrent neural networks. Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations, modeling networks of controlled nonlinear oscillators. We prove precise bounds on the gradients of the hidden states, leading to the mitigation of the exploding and vanishing gradient problem for this RNN. Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks, demonstrating the potential of this architecture to provide stable and accurate RNNs for processing complex sequential data.

deep learning, neural network, oscillator, (20 more...)

2010.00951

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)