the tangent kernel cannot be explained from the point of view of "lazy training": when the last layer is non-linear, the