Generalization Theory and Deep Nets, An introduction
Deep learning holds many mysteries for theory, as we have discussed on this blog. Lately many ML theorists have become interested in the generalization mystery: why do trained deep nets perform well on previously unseen data, even though they have way more free parameters than the number of datapoints (the classic "overfitting" regime)? Zhang et al.'s paper Understanding Deep Learning requires Rethinking Generalization played some role in bringing attention to this challenge. Their main experimental finding is that if you take a classic convnet architecture, say Alexnet, and train it on images with random labels, then you can still achieve very high accuracy on the training data. Needless to say, the trained net is subsequently unable to predict the (random) labels of still-unseen images, which means it doesn't generalize.
Dec-17-2017, 01:45:25 GMT