The Curse of Highly Variable Functions for Local Kernel Machines
Bengio, Yoshua, Delalleau, Olivier, Roux, Nicolas L.
–Neural Information Processing Systems
We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms that rely solely on the smoothness prior-with similarity between examples expressed with a local kernel - are sensitive to the curse of dimensionality, or more precisely to the variability of the target. Our discussion covers supervised, semisupervised andunsupervised learning algorithms. These algorithms are found to be local in the sense that crucial properties of the learned function atx depend mostly on the neighbors of x in the training set. This makes them sensitive to the curse of dimensionality, well studied for classical nonparametric statistical learning. We show in the case of the Gaussian kernel that when the function to be learned has many variations, these algorithms require a number of training examples proportional to the number of variations, which could be large even though there may exist shortdescriptions of the target function, i.e. their Kolmogorov complexity maybe low. This suggests that there exist non-local learning algorithms that at least have the potential to learn about such structured but apparently complex functions (because locally they have many variations), whilenot using very specific prior domain knowledge.
Neural Information Processing Systems
Dec-31-2006