4b5deb9a14d66ab0acc3b8a2360cde7c-Supplemental.pdf

Apr-25-2026, 18:53:31 GMT–Neural Information Processing Systems

What can linearized neural networks actually say about generalization? As mentioned in the main text, all our models are trained using the same scheme which was selected without any hyperparameter tuning, besides ensuring a good performance on CIFAR2 for the neural networks. Namely, we train using stochastic gradient descent (SGD) to optimize a binary crossentropy loss, with a decaying learning rate starting at 0.05 and momentum set to 0.9. Furthermore, we use a batch size of 128and train for a 100epochs. This is enough to obtain close-to-zero training losses for the neural networks, and converge to a stable test accuracy in the case of the linearized models1.

artificial intelligence, eigenfunction, machine learning, (18 more...)

Neural Information Processing Systems

Apr-25-2026, 18:53:31 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.54)

Duplicate Docs Excel Report

Title
4b5deb9a14d66ab0acc3b8a2360cde7c-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found