back-matching propagation
Reviews: On the Local Hessian in Back-propagation
They propose that backpropagation with respect to a loss function is equivalent to a single step of a "back-matching propagation" procedure in which, after a forward evaluation, we alternately optimize the weights and input activations for each block to minimize a loss for the block's output. The authors propose that architectures and training procedures which improve the condition number of the Hessian of this back-matching loss are more efficient and support this by analytically studying the effects of orthonormal initialization, skip connections, and batch-norm. They offer further evidence for this characterization by designing a blockwise learning-rate scaling method based on an approximation of the backmatching loss and demonstrating an improved learning curve for VGG13 on CIFAR10 and CIFAR100.
Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation
Zhang, Huishuai, Chen, Wei, Liu, Tie-Yan
Stochastic gradient descent (SGD) has achieved great success in training deep neural network, where the gradient is computed through back-propagation. However, the back-propagated values of different layers vary dramatically. This inconsistence of gradient magnitude across different layers renders optimization of deep neural network with a single learning rate problematic. We introduce the back-matching propagation which computes the backward values on the layer's parameter and the input by matching backward values on the layer's output. This leads to solving a bunch of least-squares problems, which requires high computational cost. We then reduce the back-matching propagation with approximations and propose an algorithm that turns to be the regular SGD with a layer-wise adaptive learning rate strategy. This allows an easy implementation of our algorithm in current machine learning frameworks equipped with auto-differentiation. We apply our algorithm in training modern deep neural networks and achieve favorable results over SGD.