Appendix

Feb-11-2026, 20:40:53 GMT–Neural Information Processing Systems

Weheldoutavalidation setfromthetraining set,andusedthisvalidation settoselecttheL2 regularization hyperparameter,which weselected from 45logarithmically spaced values between 10 6 and 105, applied to the sum of the per-example losses. Because the optimization problem is convex, we used the previous weights as a warm start as we increased theL2 regularization hyperparameter. Wemeasured eithertop-1ormean per-class accuracy, depending on which was suggested by the dataset creators. A.3 Fine-tuning In our fine-tuning experiments in Table 2, we used standard ImageNet-style data augmentationand trained for 20,000 steps with SGD with momentum of0.9 and cosine annealing [ 20]without restarts. Each curve represents a different model.

accuracy, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Feb-11-2026, 20:40:53 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (1.00)
  - Representation & Reasoning (0.66)

Duplicate Docs Excel Report

Title
f0bf4a2da952528910047c31b6c2e951-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found