Volume-preserving Neural Networks: A Solution to the Vanishing Gradient Problem
MacDonald, Gordon, Godbout, Andrew, Gillcash, Bryn, Cairns, Stephanie
Department of Mathematics and Statistics McGill University Montreal, QC H3A 0E9 Canada Editor: Abstract We propose a novel approach to addressing the vanishing (or exploding) gradient problem in deep neural networks. We construct a new architecture for deep neural networks where all layers (except the output layer) of the network are a combination of rotation, permutation, diagonal, and activation sublayers which are all volume preserving. This control on the volume forces the gradient (on average) to maintain equilibrium and not explode or vanish. Volume-preserving neural networks train reliably, quickly and accurately and the learning rate is consistent across layers in deep volume-preserving neural networks. To demonstrate this we apply our volume-preserving neural network model to two standard datasets. Keywords: volume-preserving, neural network, machine learning, deep learning, vanishing gradient problem 1. Introduction Deep neural networks are characterized by the composition of a large number of functions (aka layers), each typically consisting of an affine transformation followed by a non-affine "activation function". Each layer is determined by a number of parameters which are trained on data to approximate some function. The deepness refers to the number of such functions composed (or the number of layers). The number of layers required to be deep is not well-defined, but an overview of deep learning (Schmidhuber, 2015) states that any 1 arXiv:1911.09576v2
Nov-22-2019
- Country:
- North America > Canada
- Quebec > Montreal (0.54)
- Prince Edward Island > Queens County
- Charlottetown (0.04)
- North America > Canada
- Genre:
- Research Report (0.84)
- Technology: