High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba1, Murat A. Erdogdu 1, Taiji Suzuki

Neural Information Processing Systems 

We consider two scalings of the first step learning rate η . For small η, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random feature model, but cannot defeat the best linear model on the input.

Duplicate Docs Excel Report

Title
High

Similar Docs  Excel Report  more

TitleSimilaritySource
None found