Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study