On the Efficiency of ERM in Feature Learning