Reviews: Can We Gain More from Orthogonality Regularizations in Training Deep Networks?

Neural Information Processing Systems 

In extensive experiments with state of the art models, the paper shows that soft orthogonality can improve training stability and yield better classification accuracy than the same models trained without such regularization. The paper proposes a method to approximately enforce all singular values of the weight matrices to be equal to 1, using a sampling-based approach that does not require computing an expensive SVD operation. Major comments: This paper presents interesting experiments showing that regularization towards orthogonal weights can stabilize and speed up learning, particularly near the beginning of training; and improve final test accuracy in several large models. These results could be of broad interest. One concern with the experimental methods is that they use carefully sculpted hyper parameter trajectories for some methods. How were these trajectories selected?