Channel-Directed Gradients for Optimization of Convolutional Neural Networks

Lao, Dong, Zhu, Peihao, Wonka, Peter, Sundaramoorthi, Ganesh

arXiv.org Artificial Intelligence 

We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. The method requires only simple processing of existing stochastic gradients, can be used in conjunction with any optimizer, and has only a linear overhead (in the number of parameters) compared to computation of the stochastic gradient. We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental. We present the continuum theory of such gradients, its discretization, and application to deep networks. Experiments on benchmark datasets, several networks and baseline optimizers show that optimizers can be improved in generalization error by simply computing the stochastic gradient with respect to output-channel directed metrics. Stochastic gradient descent (SGD) is currently the dominant algorithm for optimizing large-scale convolutional neural networks (CNNs) LeCun et al. (1998); Simonyan & Zisserman (2014); He et al. (2016b). Although there has been large activity in optimization methods seeking to improve performance, SGD still dominates in large-scale CNN optimization in terms of its generalization ability. Despite SGD's dominance, there is still often a gap between training and real-world test accuracy performance in applications, which necessitates research in optimization methods to increase generalization accuracy.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found