The Computational Limits of Deep Learning
The relationship between performance, model complexity, and computational requirements in deep learning is still not well understood theoretically. Nevertheless, there are important reasons to believe that deep learning is intrinsically more reliant on computing power than other techniques, in particular due to the role of overparameterization and how this scales as additional training data are used to improve performance (including, for example, classification error rate, root mean squared regression error, etc.). Classically this would lead to overfitting, but stochastic gradient-based optimization methods provide a regularizing effect due to early stopping [pillaud2018statistical, Belkin15849]111This is often called implicit regularization, since there is no explicit regularization term in the model., moving the neural networks into an interpolation regime, where the training data is fit almost exactly while still maintaining reasonable predictions on intermediate points [belkin2018overfitting, belkin2019does]. The challenge of overparameterization is that the number of deep learning parameters must grow as the number of data points grows. Since the cost of training a deep learning model scales with the product of the number of parameters with the number of data points, this implies that computational requirements grow as at least the square of the number of data points in the overparameterized setting.
Jul-14-2020, 21:10:52 GMT
- Technology: