polynomial classifier
Automatic Capacity Tuning of Very Large VC-Dimension Classifiers
Large VC-dimension classifiers can learn difficult tasks, but are usually impractical because they generalize well only if they are trained with huge quantities of data. In this paper we show that even high-order polynomial classifiers in high dimensional spaces can be trained with a small amount of training data and yet generalize better than classifiers with a smaller VC-dimension. This is achieved with a maximum margin algorithm (the Generalized Portrait). The technique is applicable to a wide variety of classifiers, including Perceptrons, polynomial classifiers (sigma-pi unit net(cid:173) works) and Radial Basis Functions. The effective number of parameters is adjusted automatically by the training algorithm to match the complexity of the problem.
Bucketed PCA Neural Networks with Neurons Mirroring Signals
The bucketed PCA neural network (PCA-NN) with transforms is developed here in an effort to benchmark deep neural networks (DNN's), for problems on supervised classification. Most classical PCA models apply PCA to the entire training data set to establish a reductive representation and then employ non-network tools such as high-order polynomial classifiers. In contrast, the bucketed PCA-NN applies PCA to individual buckets which are constructed in two consecutive phases, as well as retains a genuine architecture of a neural network. This facilitates a fair apple-to-apple comparison to DNN's, esp. to reveal that a major chunk of accuracy achieved by many impressive DNN's could possibly be explained by the bucketed PCA-NN (e.g., 96% out of 98% for the MNIST data set as an example). Compared with most DNN's, the three building blocks of the bucketed PCA-NN are easier to comprehend conceptually - PCA, transforms, and bucketing for error correction. Furthermore, unlike the somewhat quasi-random neurons ubiquitously observed in DNN's, the PCA neurons resemble or mirror the input signals and are more straightforward to decipher as a result.
Parallelized Tensor Train Learning of Polynomial Classifiers
Chen, Zhongming, Batselier, Kim, Suykens, Johan A. K., Wong, Ngai
Pattern classification is the machine learning task of identifying to which category a new observation belongs, on the basis of a training set of observations whose category membership is known. This type of machine learning algorithm that uses a known training dataset to make predictions is called supervised learning, which has been extensively studied and has wide applications in the fields of bioinformatics [1], computer-aided diagnosis (CAD) [2], machine vision [3], speech recognition [4], handwriting recognition [5], spam detection and many others [6], [7], [8]. Usually, different kinds of learning methods use different models to generalize from training examples to novel test examples. As pointed out in [9], [10], one of the important invariants in these applications is the local structure: variables that are spatially or temporally nearby are highly correlated. Local correlations benefit extracting local features because configurations of neighboring variables can be classified into a small number of categories (e.g.
A Sequence Kernel and its Application to Speaker Recognition
A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
A Sequence Kernel and its Application to Speaker Recognition
A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.