High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba1, Murat A. Erdogdu 1, Taiji Suzuki

Aug-19-2025, 20:48:30 GMT–Neural Information Processing Systems

We consider two scalings of the first step learning rate η . For small η, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random feature model, but cannot defeat the best linear model on the input.

artificial intelligence, machine learning, neural network, (12 more...)

Neural Information Processing Systems

Aug-19-2025, 20:48:30 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States > California
    - San Diego County > San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.48)

Duplicate Docs Excel Report

Title
High

Similar Docs Excel Report more

Title	Similarity	Source
None found