Appendix: OnInfinite-WidthHypernetworks

Neural Information Processing Systems 

The variance was computed empirically overk = 100 normally distributed samplesw. As can be seen, the variance of the kernel tends to zero only when both widths increase. The hyperkernel used corresponds to the infinite width limit ofthesame architecture. As can be seen in Figure 1, whenf is wide and kept fixed, there is a clear improvement in test performance as the width ofg increases, for every learning rate in which the networks provide non-trivial performance. Whenf is wide and kept fixed, a deeperg incurs slower training and lower overall test performance.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found