How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks

Open in new window