Technology
Classification on Pairwise Proximity Data
Graepel, Thore, Herbrich, Ralf, Bollmann-Sdorra, Peter, Obermayer, Klaus
We investigate the problem of learning a classification task on data represented in terms of their pairwise proximities. This representation does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors, from which pairwise proximities can always be calculated. Our first approach is based on a combined linear embedding and classification procedure resulting in an extension of the Optimal Hyperplane algorithm to pseudo-Euclidean data. As an alternative we present another approach based on a linear threshold model in the proximity values themselves, which is optimized using Structural Risk Minimization. We show that prior knowledge about the problem can be incorporated by the choice of distance measures and examine different metrics W.r.t.
Multi-Electrode Spike Sorting by Clustering Transfer Functions
Rinberg, Dmitry, Davidowitz, Hanan, Tishby, Naftali
Since every electrode is in a different position it will measure a different contribution from each of the different neurons. Simply stated, the problem is this: how can these complex signals be untangled to determine when each individual cell fired? This problem is difficult because, a) the objects being classified are very similar and often noisy, b) spikes coming from the same cell can ·Permanent address: Institute of Computer Science and Center for Neural Computation, The Hebrew University, Jerusalem, Israel.
The Bias-Variance Tradeoff and the Randomized GACV
Wahba, Grace, Lin, Xiwu, Gao, Fangyu, Xiang, Dong, Klein, Ronald, Klein, Barbara
We propose a new in-sample cross validation based method (randomized GACV) for choosing smoothing or bandwidth parameters that govern the bias-variance or fit-complexity tradeoff in'soft' classification. Soft classification refers to a learning procedure which estimates the probability that an example with a given attribute vector is in class 1 vs class O. The target for optimizing the the tradeoff is the Kullback-Liebler distance between the estimated probability distribution and the'true' probability distribution, representing knowledge of an infinite population. The method uses a randomized estimate of the trace of a Hessian and mimics cross validation at the cost of a single relearning with perturbed outcome data.
Improved Switching among Temporally Abstract Actions
Sutton, Richard S., Singh, Satinder P., Precup, Doina, Ravindran, Balaraman
In robotics and other control applications it is commonplace to have a preexisting set of controllers for solving subtasks, perhaps handcrafted or previously learned or planned, and still face a difficult problem of how to choose and switch among the controllers to solve an overall task as well as possible. In this paper we present a framework based on Markov decision processes and semi-Markov decision processes for phrasing this problem, a basic theorem regarding the improvement in performance that can be obtained by switching flexibly between given controllers, and example applications of the theorem. In particular, we show how an agent can plan with these high-level controllers and then use the results of such planning to find an even better plan, by modifying the existing controllers, with negligible additional cost and no re-planning. In one of our examples, the complexity of the problem is reduced from 24 billion state-action pairs to less than a million state-controller pairs. In many applications, solutions to parts of a task are known, either because they were handcrafted by people or because they were previously learned or planned. For example, in robotics applications, there may exist controllers for moving joints to positions, picking up objects, controlling eye movements, or navigating along hallways. More generally, an intelligent system may have available to it several temporally extended courses of action to choose from. In such cases, a key challenge is to take full advantage of the existing temporally extended actions, to choose or switch among them effectively, and to plan at their level rather than at the level of individual actions.
General Bounds on Bayes Errors for Regression with Gaussian Processes
Opper, Manfred, Vivarelli, Francesco
Based on a simple convexity lemma, we develop bounds for different types of Bayesian prediction errors for regression with Gaussian processes. The basic bounds are formulated for a fixed training set. Simpler expressions are obtained for sampling from an input distribution which equals the weight function of the covariance kernel, yielding asymptotically tight results. The results are compared with numerical experiments.
Learning Lie Groups for Invariant Visual Perception
Rao, Rajesh P. N., Ruderman, Daniel L.
One of the most important problems in visual perception is that of visual invariance: how are objects perceived to be the same despite undergoing transformations such as translations, rotations or scaling? In this paper, we describe a Bayesian method for learning invariances based on Lie group theory. We show that previous approaches based on first-order Taylor series expansions of inputs can be regarded as special cases of the Lie group approach, the latter being capable of handling in principle arbitrarily large transfonnations. Using a matrixexponential based generative model of images, we derive an unsupervised algorithm for learning Lie group operators from input data containing infinitesimal transfonnations.
The Bias-Variance Tradeoff and the Randomized GACV
Wahba, Grace, Lin, Xiwu, Gao, Fangyu, Xiang, Dong, Klein, Ronald, Klein, Barbara
We propose a new in-sample cross validation based method (randomized GACV) for choosing smoothing or bandwidth parameters that govern the bias-variance or fit-complexity tradeoff in'soft' classification. Soft classification refers to a learning procedure which estimates the probability that an example with a given attribute vector is in class 1 vs class O. The target for optimizing the the tradeoff is the Kullback-Liebler distance between the estimated probability distribution and the'true' probability distribution, representing knowledge of an infinite population. The method uses a randomized estimate of the trace of a Hessian and mimics cross validation at the cost of a single relearning with perturbed outcome data.
Semiparametric Support Vector and Linear Programming Machines
Smola, Alex J., Frieß, Thilo-Thomas, Schölkopf, Bernhard
In fact, for many of the kernels used (not the polynomial kernels) like Gaussian rbf-kernels it can be shown [6] that SV machines are universal approximators. While this is advantageous in general, parametric models are useful techniques in their own right. Especially if one happens to have additional knowledge about the problem, it would be unwise not to take advantage of it. For instance it might be the case that the major properties of the data are described by a combination of a small set of linear independent basis functions {¢Jt (.),..., ¢n (.)}. Or one may want to correct the data for some (e.g.
Discovering Hidden Features with Gaussian Processes Regression
Vivarelli, Francesco, Williams, Christopher K. I.
W is often taken to be diagonal, but if we allow W to be a general positive definite matrix which can be tuned on the basis of training data, then an eigen-analysis of W shows that we are effectively creating hidden features, where the dimensionality of the hidden-feature space is determined by the data. We demonstrate the superiority of predictions using the general matrix over those based on a diagonal matrix on two test problems.