Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
Liu, Jinlong, Jiang, Guoqing, Bai, Yunzhi, Chen, Ting, Wang, Huayan
GSNR of a parameter is defined as the ratio between its gradient's squared mean and Previous work (Zhang et al., 2016; Hardt et al., 2015; Dziugaite & Roy, 2017) suggests that the The GSNR of a parameter is defined as the ratio between its gradient's squared mean and variance Previous work tried to use GSNR to conduct theoretical analysis on deep learning. For example, Rainforth et al. (2018) used GSNR to analyze variational bounds in Intuitively, GSNR measures the similarity of a parameter's gradients among different training samples. To reveal the mechanism of DNNs' good generalization ability, we show that the gradient descent We believe this is probably the key to DNNs' remarkable generalization ability. In the remainder of this paper we first analyze the relation between GSNR and generalization (Section 2). At a particular point of the parameter space, GSNR measures the consistency of a parameter's gradients across different data samples.
Jan-21-2020