On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
Ng, Andrew Y., Jordan, Michael I.
–Neural Information Processing Systems
Discriminative classifiers model the posterior p(ylx)directly, or learn a direct map from inputs x to the class labels. There are several compelling reasons for using discriminative rather than generative classifiers, oneof which, succinctly articulated by Vapnik [6], is that "one should solve the [classification] problem directly and never solve a more general problem as an intermediate step [such as modeling p(xly)]." Indeed, leaving aside computational issues and matters such as handling missing data, the prevailing consensus seems to be that discriminative classifiers are almost always to be preferred to generative ones. Anotherpiece of prevailing folk wisdom is that the number of examples needed to fit a model is often roughly linear in the number of free parameters of a model. This has its theoretical basis in the observation that for "many" models, the VC dimension is roughly linear or at most some low-order polynomial in the number of parameters (see, e.g., [1, 3]), and it is known that sample complexity in the discriminative setting is linear in the VC dimension [6]. In this paper, we study empirically and theoretically the extent to which these beliefs are true. A parametric family of probabilistic models p(x, y) can be fit either to optimize the joint likelihood of the inputs and the labels, or fit to optimize the conditional likelihood p(ylx), or even fit to minimize the 0-1 training error obtained by thresholding p(ylx) to make predictions.
Neural Information Processing Systems
Dec-31-2002
- Country:
- North America > United States > California > Alameda County > Berkeley (0.14)
- Genre:
- Research Report
- Experimental Study (0.46)
- New Finding (0.56)
- Research Report