Directional convergence and alignment in deep learning

Ji, Ziwei, Telgarsky, Matus

arXiv.org Machine Learning 

Recent efforts to rigorously analyze the optimization of deep networks have yielded many exciting developments, for instance the neural tangent (Jacot et al., 2018; Du et al., 2018; Allen-Zhu et al., 2018; Zou et al., 2018) and mean-field perspectives (Mei et al., 2019; Chizat and Bach, 2018). In these works, it is shown that small training or even testing error are possible for wide networks. The above theories, with finite width networks, usually require the weights to stay close to initialization in certain norms. By contrast, practitioners run their optimization methods as long as their computational budget allows (Shallue et al., 2018), and if the data can be perfectly classified, the parameters are guaranteed to diverge in norm to infinity (Lyu and Li, 2019). This raises a worry that the prediction surface can continually change during training; indeed, even on simple data, as in Figure 1, the prediction surface continues to change after perfect classification is achieved, and even with large width is not close to the maximum margin predictor from the neural tangent regime. If the prediction surface never stops changing, then the generalization behavior, adversarial stability, and other crucial properties of the predictor could also be unstable. In this paper, we resolve this worry by guaranteeing stable convergence behavior of deep networks as training proceeds, despite this growth of weight vectors to infinity. Concretely: 1. Directional convergence: the parameters converge in direction, which suffices to guarantee convergence of many other relevant quantities, such as the prediction margins.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found