Country
(Not) Bounding the True Error
We present a new approach to bounding the true error rate of a continuous valued classifier based upon PAC-Bayes bounds. The method first constructs adistribution over classifiers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classifier found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound.
Minimax Probability Machine
Lanckriet, Gert, Ghaoui, Laurent E., Bhattacharyya, Chiranjib, Jordan, Michael I.
When constructing a classifier, the probability of correct classification offuture data points should be maximized. In the current paper this desideratum is translated in a very direct way into an optimization problem, which is solved using methods from convex optimization.We also show how to exploit Mercer kernels in this setting to obtain nonlinear decision boundaries. A worst-case bound on the probability of misclassification of future data is obtained explicitly. 1 Introduction Consider the problem of choosing a linear discriminant by minimizing the probabilities thatdata vectors fall on the wrong side of the boundary. One way to attempt to achieve this is via a generative approach in which one makes distributional assumptions aboutthe class-conditional densities and thereby estimates and controls the relevant probabilities. The need to make distributional assumptions, however, casts doubt on the generality and validity of such an approach, and in discriminative solutionsto classification problems it is common to attempt to dispense with class-conditional densities entirely.
A Dynamic HMM for On-line Segmentation of Sequential Data
Kohlmorgen, Jens, Lemm, Steven
We propose a novel method for the analysis of sequential data that exhibits an inherent mode switching. In particular, the data might be a non-stationary time series from a dynamical system that switches between multiple operating modes. Unlike other approaches, ourmethod processes the data incrementally and without any training of internal parameters. We use an HMM with a dynamically changingnumber of states and an online variant of the Viterbi algorithm that performs an unsupervised segmentation and classification of the data on-the-fly, i.e. the method is able to process incomingdata in real-time. The main idea of the approach is to track and segment changes of the probability density of the data in a sliding window on the incoming data stream.
Online Learning with Kernels
Kivinen, Jyrki, Smola, Alex J., Williamson, Robert C.
We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization.
Active Information Retrieval
Jaakkola, Tommi, Siegelmann, Hava T.
In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. The users may further refine their query as needed. This process may result in a lengthy correspondence without conclusion. We propose an alternative active learning approach, where the system respondsto the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. The information is in each case unambiguously incorporated by the system and the subsequent queries are adjusted to minimize the need for further exchange. The system's initiated queries are subject to resource constraints pertaining to the amount of information thatcan be presented to the user per iteration.
The Method of Quantum Clustering
We propose a novel clustering method that is an extension of ideas inherent toscale-space clustering and support-vector clustering. Like the latter, itassociates every data point with a vector in Hilbert space, and like the former it puts emphasis on their total sum, that is equal to the scalespace probabilityfunction. The novelty of our approach is the study of an operator in Hilbert space, represented by the Schrรถdinger equation of which the probability function is a solution. This Schrรถdinger equation contains a potential function that can be derived analytically from the probability function.
Kernel Feature Spaces and Nonlinear Blind Souce Separation
Harmeling, Stefan, Ziehe, Andreas, Kawanabe, Motoaki, Mรผller, Klaus-Robert
In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimensionand to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.
Escaping the Convex Hull with Extrapolated Vector Machines
Maximum margin classifiers such as Support Vector Machines (SVMs) critically depends upon the convex hulls of the training samples of each class, as they implicitly search for the minimum distance between the convex hulls. We propose Extrapolated Vector Machines(XVMs) which rely on extrapolations outside these convex hulls. XVMs improve SVM generalization very significantly on the MNIST [7] OCR data. They share similarities with the Fisher discriminant: maximize the inter-class margin while minimizing theintra-class disparity.
Discriminative Direction for Kernel Classifiers
In many scientific and engineering applications, detecting and understanding differencesbetween two groups of examples can be reduced to a classical problem of training a classifier for labeling new examples while making as few mistakes as possible. In the traditional classification setting,the resulting classifier is rarely analyzed in terms of the properties of the input data captured by the discriminative model. However, suchanalysis is crucial if we want to understand and visualize the detected differences. We propose an approach to interpretation of the statistical modelin the original feature space that allows us to argue about the model in terms of the relevant changes to the input vectors. For each point in the input space, we define a discriminative direction to be the direction that moves the point towards the other class while introducing as little irrelevant change as possible with respect to the classifier function. Wederive the discriminative direction for kernel-based classifiers, demonstrate the technique on several examples and briefly discuss its use in the statistical shape analysis, an application that originally motivated this work.
Product Analysis: Learning to Model Observations as Products of Hidden Variables
Frey, Brendan J., Kannan, Anitha, Jojic, Nebojsa
Factor analysis and principal components analysis can be used to model linear relationships between observed variables and linearly map high-dimensional data to a lower-dimensional hidden space. In factor analysis, the observations are modeled as a linear combination ofnormally distributed hidden variables. We describe a nonlinear generalization of factor analysis, called "product analysis", thatmodels the observed variables as a linear combination of products of normally distributed hidden variables. Just as factor analysiscan be viewed as unsupervised linear regression on unobserved, normally distributed hidden variables, product analysis canbe viewed as unsupervised linear regression on products of unobserved, normally distributed hidden variables. The mapping betweenthe data and the hidden space is nonlinear, so we use an approximate variational technique for inference and learning.