Statistical Learning
Using Manifold Stucture for Partially Labeled Classification
Belkin, Mikhail, Niyogi, Partha
We consider the general problem of utilizing both labeled and unlabeled datato improve classification accuracy. Under t he assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on t he submanifold inquestion rather than the total ambient space. Using the Laplace Beltrami operator one produces a basis for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once a basis is obtained, trainingcan be performed using the labeled data set.
Fast Sparse Gaussian Process Methods: The Informative Vector Machine
Herbrich, Ralf, Lawrence, Neil D., Seeger, Matthias
We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles,previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated inO(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements.
The Decision List Machine
Sokolova, Marina, Marchand, Mario, Japkowicz, Nathalie, Shawe-taylor, John S.
We introduce a new learning algorithm for decision lists to allow features that are constructed from the data and to allow a tradeoff betweenaccuracy and complexity. We bound its generalization error in terms of the number of errors and the size of the classifier it finds on the training data. We also compare its performance on some natural data sets with the set covering machine and the support vector machine.
Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
Fischer, Bernd, Schumann, Johann, Buntine, Wray, Gray, Alexander G.
Machine learning has reached a point where many probabilistic methods canbe understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms customized fordifferent models. Here, we describe the AUTOBAYES system which takes a high-level statistical model specification, uses powerful symbolic techniques based on schema-based program synthesis and computer algebra to derive an efficient specialized algorithm for learning that model, and generates executable code implementing that algorithm. This capability is far beyond that of code collections such as Matlab toolboxes oreven tools for model-independent optimization such as BUGS for Gibbs sampling: complex new algorithms can be generated without newprogramming, algorithms can be highly specialized and tightly crafted for the exact structure of the model and data, and efficient and commented code can be generated for different languages or systems.
Stochastic Neighbor Embedding
Hinton, Geoffrey E., Roweis, Sam T.
We describe a probabilistic approach to the task of placing objects, described byhigh-dimensional vectors or by pairwise dissimilarities, in a low-dimensional space in a way that preserves neighbor identities. A Gaussian is centered on each object in the high-dimensional space and the densities under this Gaussian (or the given dissimilarities) are used to define a probability distribution over all the potential neighbors of the object. The aim of the embedding is to approximate this distribution aswell as possible when the same operation is performed on the low-dimensional "images" of the objects. A natural cost function is a sum of Kullback-Leibler divergences, one per object, which leads to a simple gradient for adjusting the positions of the low-dimensional images. Unlikeother dimensionality reduction methods, this probabilistic framework makes it easy to represent each object by a mixture of widely separated low-dimensional images. This allows ambiguous objects, like the document count vector for the word "bank", to have versions close to the images of both "river" and "finance" without forcing the images of outdoor concepts to be located close to those of corporate concepts.
Clustering with the Fisher Score
Tsuda, Koji, Kawanabe, Motoaki, Müller, Klaus-Robert
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved inthe space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithmspecialized for the Fisher score, which can exploit important dimensions.This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
Hyperkernels
Ong, Cheng S., Williamson, Robert C., Smola, Alex J.
We consider the problem of choosing a kernel suitable for estimation using a Gaussian Process estimator or a Support Vector Machine. A novel solution is presented which involves defining a Reproducing Kernel HilbertSpace on the space of kernels itself. By utilizing an analog of the classical representer theorem, the problem of choosing a kernel from a parameterized family of kernels (e.g. of varying width) is reduced to a statistical estimation problem akin to the problem of minimizing a regularized risk functional. Various classical settings for model or kernel selection are special cases of our framework.
Learning to Classify Galaxy Shapes Using the EM Algorithm
Kirshner, Sergey, Cadez, Igor V., Smyth, Padhraic, Kamath, Chandrika
We describe the application of probabilistic model-based learning to the problem of automatically identifying classes of galaxies, based on both morphological and pixel intensity characteristics. The EM algorithm can be used to learn how to spatially orient a set of galaxies so that they are geometrically aligned. We augment this "ordering-model" with a mixture model on objects, and demonstrate how classes of galaxies can be learned in an unsupervised manner using a two-level EM algorithm. The resulting models provide highly accurate classi£cation of galaxies in cross-validation experiments.
Real-Time Monitoring of Complex Industrial Processes with Particle Filters
Morales-Menéndez, Rubén, Freitas, Nando de, Poole, David
We consider two ubiquitous processes:an industrial dryer and a level tank. For these applications, wecompared three particle filtering variants: standard particle filtering, Rao-Blackwellised particle filtering and a version of Rao-Blackwellised particle filtering that does one-step look-ahead to select good sampling regions. We show that the overhead of the extra processing perparticle of the more sophisticated methods is more than compensated bythe decrease in error and variance.