[R1/R2] Infinite width assumption: the infinite width assumption is needed due to the technical detail that the norm

Neural Information Processing Systems 

We thank reviewers for their valuable comments. We respond to the main concerns below. Similar to that in Zhang et al. [31], we chose 10k block ResNet to stress the We will rephrase L243 to better express this. Derivative of weights depend on this term due to the chain rule. We will make this explicit in the revised manuscript.