Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks