the tangent kernel cannot be explained from the point of view of "lazy training": when the last layer is non-linear, the

Aug-16-2025, 00:47:26 GMT–Neural Information Processing Systems

We thank all reviewers for the insightful and encouraging comments. Theorem 3.2 and results in Appendix G has been proved previously (e.g., [1]). Our Hessian analysis results, including Theorem 3.2 and Theorem 3.1, are new. This can perhaps cause confusion. The paper mostly focuses on squared loss while widely applied NNs use softmax-cross entropy loss.

lazy training, tangent kernel, theorem 3, (15 more...)

Neural Information Processing Systems

Aug-16-2025, 00:47:26 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)