T able of Contents

Neural Information Processing Systems 

Failure cases of GET . It is worth noting that the Gaussian equivalence property (Theorem 3) may no longer hold if we train the features longer. In particular, because of our mean-field parameteri-zation, the first-layer weight W needs to travel sufficiently far away from initialization to achieve small training loss (see Figure 2). Hence in our experimental simulations (where n,d,N are large but finite), as the number of steps t increases, we expect the Gaussian equivalence predictions to become inaccurate at some point. This transition is empirically demonstrated in Figure 4(a).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found