Appendix: OnInfinite-WidthHypernetworks

Feb-9-2026, 11:57:52 GMT–Neural Information Processing Systems

The variance was computed empirically overk = 100 normally distributed samplesw. As can be seen, the variance of the kernel tends to zero only when both widths increase. The hyperkernel used corresponds to the infinite width limit ofthesame architecture. As can be seen in Figure 1, whenf is wide and kept fixed, there is a clear improvement in test performance as the width ofg increases, for every learning rate in which the networks provide non-trivial performance. Whenf is wide and kept fixed, a deeperg incurs slower training and lower overall test performance.

artificial intelligence, machine learning, wehave, (17 more...)

Neural Information Processing Systems

Feb-9-2026, 11:57:52 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada
  - British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East
  - Israel (0.05)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.88)

Duplicate Docs Excel Report

Title
999df4ce78b966de17aee1dc87111044-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found