Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Open in new window