a9df2255ad642b923d95503b9a7958d8-Paper.pdf

May-31-2025, 09:11:29 GMT–Neural Information Processing Systems

For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGDtrained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization. How can we reconcile the above claims?

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

May-31-2025, 09:11:29 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > California (0.28)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.55)

Duplicate Docs Excel Report

Title
a9df2255ad642b923d95503b9a7958d8-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found