High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba1, Murat A. Erdogdu 1, Taiji Suzuki
–Neural Information Processing Systems
We consider two scalings of the first step learning rate η . For small η, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random feature model, but cannot defeat the best linear model on the input.
Neural Information Processing Systems
Aug-19-2025, 20:48:30 GMT
- Country:
- North America
- United States > California
- San Diego County > San Diego (0.04)
- Canada > Ontario
- Toronto (0.14)
- United States > California
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Education (0.46)
- Technology: