Generalization to Unseen Cases

Roos, Teemu, Grünwald, Peter, Myllymäki, Petri, Tirri, Henry

Neural Information Processing Systems 

We analyze classification error on unseen cases, i.e. cases that are different fromthose in the training set. Unlike standard generalization error, this off-training-set error may differ significantly from the empirical error withhigh probability even with large sample sizes. We derive a datadependent boundon the difference between off-training-set and standard generalization error. Our result is based on a new bound on the missing mass, which for small samples is stronger than existing bounds based on Good-Turing estimators. As we demonstrate on UCI data-sets, our bound gives nontrivial generalization guarantees in many practical cases. In light of these results, we show that certain claims made in the No Free Lunch literature are overly pessimistic.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found