taylor coefficient
Learning Differential Equations that are Easy to Solve
Kelly, Jacob, Bettencourt, Jesse, Johnson, Matthew James, Duvenaud, David
Differential equations parameterized by neural networks become expensive to solve numerically as training progresses. We propose a remedy that encourages learned dynamics to be easier to solve. Specifically, we introduce a differentiable surrogate for the time cost of standard numerical solvers, using higher-order derivatives of solution trajectories. These derivatives are efficient to compute with Taylor-mode automatic differentiation. Optimizing this additional objective trades model performance against the time cost of solving the learned dynamics. We demonstrate our approach by training substantially faster, while nearly as accurate, models in supervised classification, density estimation, and time-series modelling tasks.
A Neural Network Perturbation Theory Based on the Born Series
Kaspschak, Bastian, Meißner, Ulf-G.
Deep Learning has become an attractive approach towards various data-based problems of theoretical physics in the past decade. Its protagonists, the deep neural networks (DNNs), are capable of making accurate predictions for data of arbitrarily high complexity. A well-known issue most DNNs share is their lack of interpretability. In order to explain their behavior and extract physical laws they have discovered during training, a suitable interpretation method has, therefore, to be applied post-hoc. Due to its simplicity and ubiquity in quantum physics, we decide to present a rather general interpretation method in the context of two-body scattering: We find a one-to-one correspondence between the $n^\text{th}$-order Born approximation and the $n^\text{th}$-order Taylor approximation of deep multilayer perceptrons (MLPs), that predict S-wave scattering lengths $a_0$ for discretized, attractive potentials of finite range. This defines a perturbation theory for MLPs similarily to Born approximations defining a perturbation theory for $a_0$. In the case of shallow potentials, lower-order approximations, that can be argued to be local interpretations of respective MLPs, reliably reproduce $a_0$. As deep MLPs are highly nested functions, the computation of higher-order partial derivatives, which is substantial for a Taylor approximation, is an effortful endeavour. By introducing quantities we refer to as propagators and vertices and that depend on the MLP's weights and biases, we establish a graph-theoretical approach towards partial derivatives and local interpretability. Similar to Feynman rules in quantum field theories, we find rules that systematically assign diagrams consisting of propagators and vertices to the corresponding order of the MLP perturbation theory.
The power of deeper networks for expressing natural functions
Deep learning has lately been shown to be a very powerful tool for a wide range of problems, from image segmentation to machine translation. Despite its success, many of the techniques developed by practitioners of artificial neural networks (ANNs) are heuristics without theoretical guarantees. Perhaps most notably, the power of feedforward networks with many layers (deep networks) has not been fully explained. The goal of this paper is to shed more light on this question and to suggest heuristics for how deep is deep enough. It is well-known [1-3] that nonlinear neural networks with a single hidden layer can approximate any function under reasonable assumptions, but it is possible that the networks required will be extremely large. Recent authors have shown that some functions can be approximated by deeper networks much more efficiently (i.e. with fewer neurons) than by shallower ones.