On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning