When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

Open in new window