On the training dynamics of deep networks with $L_2$ regularization