Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks

Open in new window