Shooting Craps in Search of an Optimal Strategy for Training Connectionist Pattern Classifiers

Neural Information Processing Systems 

We compare two strategies for training connectionist (as well as non(cid:173) connectionist) models for statistical pattern recognition. The probabilistic strat(cid:173) egy is based on the notion that Bayesian discrimination (i.e .• The differential strategy is based on the notion that the identity of the largest class a posteriori probability of the feature vector is all that is needed to achieve Bayesian discrimination. Each strategy is directly linked to a family of objective functions that can be used in the supervised training procedure. We prove that the probabilistic strategy - linked with error measure objective functions such as mean-squared-error and cross-entropy - typically used to train classifiers necessarily requires larger training sets and more complex classifier architectures than those needed to approximate the Bayesian discrim(cid:173) linked inant function.