Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

Open in new window