On the training dynamics of deep networks with $L_2$ regularization

Open in new window