Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

Open in new window