Temporal Dynamics of Generalization in Neural Networks
Wang, Changfeng, Venkatesh, Santosh S.
–Neural Information Processing Systems
This paper presents a rigorous characterization of how a general nonlinear learning machine generalizes during the training process when it is trained on a random sample using a gradient descent algorithm based on reduction of training error. It is shown, in particular, that best generalization performance occurs, in general, before the global minimum of the training error is achieved. The different roles played by the complexity of the machine class and the complexity of the specific machine in the class during learning are also precisely demarcated. 1 INTRODUCTION In learning machines such as neural networks, two major factors that affect the'goodness of fit' of the examples are network size (complexity) and training time. These are also the major factors that affect the generalization performance of the network. Many theoretical studies exploring the relation between generalization performance and machine complexity support the parsimony heuristics suggested by Occam's razor, to wit that amongst machines with similar training performance one should opt for the machine of least complexity.
Neural Information Processing Systems
Dec-31-1995
- Country:
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- Technology: