Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization

Neural Information Processing Systems 

What makes a classifier have the ability to generalize? There have been a lot of important attempts to address this question, but a clear answer is still elusive. Proponents of complexity theory find that the complexity of the classifier's function