[R1/R2] Infinite width assumption: the infinite width assumption is needed due to the technical detail that the norm
–Neural Information Processing Systems
We thank reviewers for their valuable comments. We respond to the main concerns below. Similar to that in Zhang et al. [31], we chose 10k block ResNet to stress the We will rephrase L243 to better express this. Derivative of weights depend on this term due to the chain rule. We will make this explicit in the revised manuscript.
Neural Information Processing Systems
Feb-14-2026, 19:36:35 GMT
- Technology: