Non-convergence of stochastic gradient descent in the training of deep neural networks

Open in new window