On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning

Open in new window