Reviews: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Jan-21-2025, 12:27:27 GMT–Neural Information Processing Systems

The paper was proofread, well-structured, and very clear. The experiments were clearly described in detail, and provided relevant results. Below we outline some detailed comments of the results. In particular, Chizat and Bach prove that the training of an NTK parameterized network is closely modeled by "lazy training" (their terminology for a linearized model). This paper is not referenced in the related work section.

chizat and bach, modern network, wide neural network, (4 more...)

Neural Information Processing Systems

Jan-21-2025, 12:27:27 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.46)
  - Statistical Learning > Gradient Descent (0.40)