How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks Mo Zhou