When does gradient descent with logistic loss find interpolating two-layer networks?

Open in new window