A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

Open in new window