Appendix: OnInfinite-WidthHypernetworks
–Neural Information Processing Systems
The variance was computed empirically overk = 100 normally distributed samplesw. As can be seen, the variance of the kernel tends to zero only when both widths increase. The hyperkernel used corresponds to the infinite width limit ofthesame architecture. As can be seen in Figure 1, whenf is wide and kept fixed, there is a clear improvement in test performance as the width ofg increases, for every learning rate in which the networks provide non-trivial performance. Whenf is wide and kept fixed, a deeperg incurs slower training and lower overall test performance.
Neural Information Processing Systems
Feb-9-2026, 11:57:52 GMT