On Stochastic Complexity and Admissible Models for Neural Network Classifiers

Neural Information Processing Systems 

In this paper we examine in a general sense the application of Minimum Description Length (MDL) techniques to the problem of selecting a good classifier from a large set of candidate models or hypotheses. Pattern recognition algorithms differ from more conventional statistical modeling techniques in the sense that they typically choose from a very large number of candidate models to describe the available data. Hence, the problem of searching through this set of candidate models is frequently a formidable one, often approached in practice by the use of greedy algorithms. In this context, techniques which allow us to eliminate portions of the hypothesis space are of considerable interest. We will show in this paper that it is possible to use the intrinsic structure of the MDL formalism to eliminate large numbers of candidate models given only minimal information about the data.