A Novel Structured Natural Gradient Descent for Deep Learning
–arXiv.org Artificial Intelligence
In order to perform calculations faster, we need to find natural gradient algorithms with low computational complexity and low storage requirements. This paper proposes a structured natural gradient optimization method (SNGD) for learning deep neural networks. SNGD first reconfigures the parameter layer of the deep network by adding a new processing layer (named local Fisher layer); and then optimizes the reconstructed network model based on traditional GD, which is equivalent to the optimization of the original network using NGD, thus effectively reducing the computational complexity of NGD. With the introduction of the local Fisher layer, the curvature information of the loss function space can be captured, and an adjustment related to the spatial curvature is added to the original gradient direction, which ensures that there is a reasonable parameter change in each update during optimization, and improves the convergence speed of the parameters. We test the proposed approach on… The main contributions of this paper are as follows: 1) By adding a new local Fisher layer to reconstruct the network, the relevant calculation of the global Fisher matrix is decomposed and finally transformed into the use of traditional GD for optimization to achieve the effect of NGD. 2) A new layer - local Fisher layer and its efficient implementation scheme are proposed. Through the introduction of the second-order information, the local Fisher layer considers the different attributes of different positions of the parameters, and adds constraints to the transformation of the model parameters, so that the gradient update can be carried out stably and quickly.
arXiv.org Artificial Intelligence
Sep-21-2021