Rethinking Gauss-Newton for learning over-parameterized models