Neural Tangent Kernel (NTK): A New Tool For Understanding Machine Learning Training
The general consensus in the machine learning community is that making a model smaller would lead to a larger training error, while a bigger model would result in a larger generalisation gap. That is why developers usually hunt for that sweet spot between errors and generalisation. However, the best test error is often achieved by the largest model, which is counterintuitive. As one increases the model complexity past the point where the model can perfectly fit the training data (Interpolation Regime), test error continues to drop! The inner training dynamics of the neural networks have long been a mystery and unlocking this would lead to a better understanding of the predictions.
Nov-13-2019, 07:47:08 GMT