Appendix A Training details
–Neural Information Processing Systems
Models are trained with Stochastic Gradient Descent with momentum equal to 0.9 [ We use a learning rate annealing scheme, decreasing the learning rate by a factor of 0.1 every 30 epochs. We train all models for 150 epochs. Then, we select the best learning rate and weight decay for each method and run 5 different seeds to report mean and standard deviation. We use the validation set of ImageNet to perform cross-validation and report performance on it. In section G we train the Augerino method on top of the Resnet-18 architecture.
Neural Information Processing Systems
Nov-15-2025, 08:16:24 GMT
- Country:
- North America > Canada > Newfoundland and Labrador > Labrador (0.04)
- Genre:
- Research Report (0.69)
- Technology: