d9731321ef4e063ebbee79298fa36f56-AuthorFeedback.pdf
–Neural Information Processing Systems
Our analysis provides full distribution information on the joint outputs. Furthermore, the9 distribution ofthe cosine similarity explains whymoderately deepand wide ReLU networks can betrained despite10 negative results by mean field (MF) analysis based on correlations. There,14 the normal distribution originates from the MF limit. In contrast, here we understand that the output distribution is15 completely determined bytheempirical covariance matrix ofinputs. This is rather obvious however. Instead, we refer to the rich literature on linear neural networks at23 initialization.
Neural Information Processing Systems
Feb-14-2026, 12:40:47 GMT
- Technology: