Understanding and Improving Feature Learning for Out-of-Distribution Generalization