Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

Open in new window