Limitations of neural network training due to numerical instability of backpropagation

Karner, Clemens, Kazeev, Vladimir, Petersen, Philipp Christian

arXiv.org Machine Learning 

Deep learning is a machine learning technique based on artificial neural networks which are trained by gradient-based methods and which have a large number of layers. This technique has been tremendously successful in a wide range of applications [26, 24, 44, 41]. Of particular interest for applied mathematicians are recent developments in which deep neural networks are applied to tasks of numerical analysis such as the numerical solution of inverse problems [1, 34, 27, 20, 38] or of (parametric) partial differential equations [7, 12, 39, 9, 40, 25, 29, 3]. The appeal of deep neural networks for these applications is due to their exceptional efficiency in representing functions from several approximation classes that underlie well-established numerical methods. In terms of approximation accuracy with respect to the number of approximation parameters, deep neural networks have been theoretically proven to achieve approximation rates that are at least as good as those of finite elements [15, 35, 30], local Taylor polynomials or splines [47, 11], wavelets [42] and, more generally, affine systems [5]. In the sequel, we consider neural networks with the rectified-linear-unit (ReLU) activation function, which is standard in most applications. In this case, the neural-network approximations are piecewiseaffine functions. We point out that all state-of-the-art results on the rates of approximation with deep ReLU neural networks that achieve higher order polynomial approximation rates are based on explicit constructions with the number of affine pieces growing exponentially with respect to the number of layers; see, e.g., [47, 46]. In this work, we argue that this central building block, functions with exponentially many affine pieces, cannot be learned with the state-of-the-art techniques.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found