Supplementary Material Appendices

Neural Information Processing Systems 

Throughout the paper, our analysis is based on the NTK prameterization [10], under which the constancy of tangent kernel is originally observed. In this section, we show that different parame-terization strategies (e.g., LeCun initialization [LeCun et al.]