Predicting Training Time Without Training

Oct-10-2024, 02:00:46 GMT–Neural Information Processing Systems

We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. To do so, we leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. This allows us to approximate the training loss and accuracy at any point during training by solving a low-dimensional Stochastic Differential Equation (SDE) in function space. Using this result, we are able to predict the time it takes for Stochastic Gradient Descent (SGD) to fine-tune a model to a given loss without having to perform any training. In our experiments, we are able to predict training time of a ResNet within a 20\% error margin on a variety of datasets and hyper-parameters, at a 30 to 45-fold reduction in cost compared to actual training.

deep network, predict training time, training time

Neural Information Processing Systems

Oct-10-2024, 02:00:46 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)