Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
Cyr, Eric C., Gulian, Mamikon A., Patel, Ravi G., Perego, Mauro, Trask, Nathaniel A.
Despite their importance, such theorems offer no explanation for the advantages of neural networks, let alone deep neural networks, over classical approximation methods, since universal approximation properties are enjoyed by polynomials (Cheney and Light, 2009) as well as single layer neural networks (Cybenko, 1989). To address this, a recent thread has emerged in the literature concerning optimal approximation with deep ReLU networks, where the error in an optimal choice of weights and biases is bounded from above using the width and depth of the neural network. For example, using the "sawtooth" function of Telgarsky (2015), Y arotsky (2017) constructed an exponentially accurate (in the number of layers) ReLU network emulator for multiplication (x,y) null xy . This construction is used to obtain upper bounds on optimal approximation based upon DNN emulation of polynomial approximation. Building on these ideas, Opschoor et al. (2019) proved that optimal approximation with deep ReLU networks can emulate adaptive hp-finite element approximation, with greater depth allowing p -refinement to obtain exponential convergence rates. An additional contribution by He et al. (2018) reinterpreted single hidden layer ReLU networks as r -adaptive piecewise linear finite element spaces.
Dec-10-2019