Convergence of SGD in Learning ReLU Models with Separable Data

Open in new window