Predicting Training Time Without Training
–Neural Information Processing Systems
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. To do so, we leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. This allows us to approximate the training loss and accuracy at any point during training by solving a low-dimensional Stochastic Differential Equation (SDE) in function space. Using this result, we are able to predict the time it takes for Stochastic Gradient Descent (SGD) to fine-tune a model to a given loss without having to perform any training. In our experiments, we are able to predict training time of a ResNet within a 20\% error margin on a variety of datasets and hyper-parameters, at a 30 to 45-fold reduction in cost compared to actual training.
Neural Information Processing Systems
Oct-10-2024, 02:00:46 GMT
- Technology: