Review for NeurIPS paper: On the linearity of large non-linear models: when and why the tangent kernel is constant

Neural Information Processing Systems 

Additional Feedback: [Post Author Response] I thank the authors for responding to concerns and questions, which made me appreciate the paper better. As clarified by the authors there won't be issues with dual submission. I think the submission is good submission and will be general interest to NeurIPS community and suggest accepting. As regards to softmax, I agree with the authors when the output is softmax that current paper analysis holds. It would be interesting what would happen with softmax nonlinearities that appears in self-attention layers of Transformer architectures.