High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba1, Murat A. Erdogdu 1, Taiji Suzuki
–Neural Information Processing Systems
We consider two scalings of the first step learning rate η . For small η, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random feature model, but cannot defeat the best linear model on the input.
Neural Information Processing Systems
Aug-19-2025, 20:48:30 GMT
- Country:
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada > Ontario
- Toronto (0.14)
- United States > California
- San Diego County > San Diego (0.04)
- Canada > Ontario
- Asia > Japan
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Education (0.46)
- Technology: