Model Complexity, Goodness of Fit and Diminishing Returns
–Neural Information Processing Systems
We consider modeling the data set D using models indexed by a complexity index k, 1:::; k:::; kmax • For example, the models could be finite mixture probability density functions (PDFs) for vector Xi'S where model complexity is indexed by the number of components k in the mixture. Alternatively, the modeling task could be to fit a conditional regression model y g(Zk) e, where now y is one of the variables in the vector X and Z is some subset of size k of the remaining components in the X vector. Such learning tasks can typically be characterized by the existence of a model and a loss function. A fitted model of complexity k is a function of the data points D and depends on a specific set of fitted parameters B. The loss function (goodness(cid:173) of-fit) is a functional of the model and maps each specific model to a scalar used to evaluate the model, e.g., likelihood for density estimation or sum-of-squares for regression. Figure 1 illustrates a typical empirical curve for loss function versus complexity, for mixtures of Markov models fitted to a large data set of 900,000 sequences. The complexity k is the number of Markov models being used in the mixture (see Cadez et al. (2000) for further details on the model and the data set). The empirical curve has a distinctly concave appearance, with large relative gains in fit for low complexity models and much more modest relative gains for high complexity models.
Neural Information Processing Systems
Feb-16-2024, 20:52:33 GMT
- Technology: