A note on Linear Bottleneck networks and their Transition to Multilinearity

Zhu, Libin, Pandit, Parthe, Belkin, Mikhail

arXiv.org Machine Learning 

For a wide neural network (WNN), when the network width is sufficiently large, there exists a linear function of parameters, arbitrarily close to the network function, in a ball of radius O(1) in the parameter space around random initialization. This local linearity explains the equivalence to the neural tangent kernel (NTK) regression for optimizing wide neural networks with small learning rates, first shown in [13]. However, an important assumption for this transition to linearity [18] to hold is that each layer must be sufficiently wide. If there is even one narrow "bottleneck" hidden layer, resulting in a so-called bottleneck neural network (BNN), the work [18] showed that the transition to linearity does not occur. An immediate question at this point is, What functions of the weights does a neural network with a bottleneck layer represent?

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found