Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study

Open in new window