Review for NeurIPS paper: On the linearity of large non-linear models: when and why the tangent kernel is constant

Feb-4-2025, 17:02:37 GMT–Neural Information Processing Systems

Additional Feedback: [Post Author Response] I thank the authors for responding to concerns and questions, which made me appreciate the paper better. As clarified by the authors there won't be issues with dual submission. I think the submission is good submission and will be general interest to NeurIPS community and suggest accepting. As regards to softmax, I agree with the authors when the output is softmax that current paper analysis holds. It would be interesting what would happen with softmax nonlinearities that appears in self-attention layers of Transformer architectures.

linearity, non-linear model, tangent kernel, (6 more...)

Neural Information Processing Systems

Feb-4-2025, 17:02:37 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (0.89)
  - Neural Networks > Deep Learning (0.41)