We review common methods of solving for multi-class from binary and generalize them to a common framework. Since conditional probabilties are useful both for quantifying the accuracy of an estimate and for calibration purposes, these are a required part of the solution. There is some indication that the best solution for multi-class classification is dependent on the particular dataset. As such, we are particularly interested in data-driven solution design, whether based on a priori considerations or empirical examination of the data. Numerical results indicate that while a one-size-fits-all solution consisting of one-versus-one is appropriate for most datasets, a minority will benefit from a more customized approach. The techniques discussed in this paper allow for a large variety of multi-class configurations and solution methods to be explored so as to optimize classification accuracy, accuracy of conditional probabilities and speed.

Support vector machines (SVM) and other kernel techniques represent a family of powerful statistical classification methods with high accuracy and broad applicability. Because they use all or a significant portion of the training data, however, they can be slow, especially for large problems. Piecewise linear classifiers are similarly versatile, yet have the additional advantages of simplicity, ease of interpretation and, if the number of component linear classifiers is not too large, speed. Here we show how a simple, piecewise linear classifier can be trained from a kernel-based classifier in order to improve the classification speed. The method works by finding the root of the difference in conditional probabilities between pairs of opposite classes to build up a representation of the decision boundary. When tested on 17 different datasets, it succeeded in improving the classification speed of a SVM for 9 of them by factors as high as 88 times or more. The method is best suited to problems with continuum features data and smooth probability functions. Because the component linear classifiers are built up individually from an existing classifier, rather than through a simultaneous optimization procedure, the classifier is also fast to train.

Probability estimates are desirable in statistical classification both for gauging the accuracy of a classification result and for calibration. Here we describe a method of solving for the conditional probabilities in multi-class classification using orthogonal error correcting codes. The method is tested on six different datasets using support vector machines and compares favorably with an existing technique based on the one-versus-one multi-class method. Probabilities are validated based on the cumulative sum of a boolean evaluation of the correctness of the class label divided by the estimated probability. Probability estimation using orthogonal coding is simple and efficient and has the potential for faster classification results than the one-versus-one method.

Supervised statistical classification is a vital tool for satellite image processing. It is useful not only when a discrete result, such as feature extraction or surface type, is required, but also for continuum retrievals by dividing the quantity of interest into discrete ranges. Because of the high resolution of modern satellite instruments and because of the requirement for real-time processing, any algorithm has to be fast to be useful. Here we describe an algorithm based on kernel estimation called Adaptive Gaussian Filtering that incorporates several innovations to produce superior efficiency as compared to three other popular methods: k-nearest-neighbour (KNN), Learning Vector Quantization (LVQ) and Support Vector Machines (SVM). This efficiency is gained with no compromises: accuracy is maintained, while estimates of the conditional probabilities are returned. These are useful not only to gauge the accuracy of an estimate in the absence of its true value, but also to re-calibrate a retrieved image and as a proxy for a discretized continuum variable. The algorithm is demonstrated and compared with the other three on a pair of synthetic test classes and to map the waterways of the Netherlands. Software may be found at: http://libagf.sourceforge.net.

In either case, the better your knowledge of data structures and algorithms, the easier time you'll have when it comes time to code up. I don't think the data structures used in machine learning are significantly different than those used in other areas of software development. Because of the size and difficulty of many of the problems, however, having a really solid handle on the basics is essential. Also, because machine learning is a very mathematical field, one should keep in mind how data structures can be used to solve mathematical problems and how they are mathematical objects in their own right. There are two ways to classify data structures: by their implementation and by their operation.