T able of Contents
–Neural Information Processing Systems
Failure cases of GET . It is worth noting that the Gaussian equivalence property (Theorem 3) may no longer hold if we train the features longer. In particular, because of our mean-field parameteri-zation, the first-layer weight W needs to travel sufficiently far away from initialization to achieve small training loss (see Figure 2). Hence in our experimental simulations (where n,d,N are large but finite), as the number of steps t increases, we expect the Gaussian equivalence predictions to become inaccurate at some point. This transition is empirically demonstrated in Figure 4(a).
Neural Information Processing Systems
Nov-17-2025, 18:29:23 GMT