Predicting Training Time Without Training Supplementary Material
–Neural Information Processing Systems
Finally, we give proofs of all statements. Algorithm 1: Estimate the Training Time on a given target dataset and hyper-parameters. Comparison of predicted and real error curve. The model is trained on a subset of 2 classes of CIFAR-10 with 150 samples. Section 4: once the noise is rescaled the SDE is able to predict the right asymptotic behavior of SGD.
Neural Information Processing Systems
Oct-2-2025, 19:08:58 GMT
- Technology: