A Novel Structured Natural Gradient Descent for Deep Learning

Sep-21-2021–arXiv.org Artificial Intelligence

In order to perform calculations faster, we need to find natural gradient algorithms with low computational complexity and low storage requirements. This paper proposes a structured natural gradient optimization method (SNGD) for learning deep neural networks. SNGD first reconfigures the parameter layer of the deep network by adding a new processing layer (named local Fisher layer); and then optimizes the reconstructed network model based on traditional GD, which is equivalent to the optimization of the original network using NGD, thus effectively reducing the computational complexity of NGD. With the introduction of the local Fisher layer, the curvature information of the loss function space can be captured, and an adjustment related to the spatial curvature is added to the original gradient direction, which ensures that there is a reasonable parameter change in each update during optimization, and improves the convergence speed of the parameters. We test the proposed approach on… The main contributions of this paper are as follows: 1) By adding a new local Fisher layer to reconstruct the network, the relevant calculation of the global Fisher matrix is decomposed and finally transformed into the use of traditional GD for optimization to achieve the effect of NGD. 2) A new layer - local Fisher layer and its efficient implementation scheme are proposed. Through the introduction of the second-order information, the local Fisher layer considers the different attributes of different positions of the parameters, and adds constraints to the transformation of the model parameters, so that the gradient update can be carried out stably and quickly.

gradient descent, neural network, optimization method, (12 more...)

arXiv.org Artificial Intelligence

Sep-21-2021

arXiv.org PDF

Add feedback

Country:
- Asia > China > Beijing > Beijing (0.05)

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning
    - Statistical Learning > Gradient Descent (0.90)
    - Neural Networks > Deep Learning (0.87)