Technology
General Bounds on Bayes Errors for Regression with Gaussian Processes
Opper, Manfred, Vivarelli, Francesco
Based on a simple convexity lemma, we develop bounds for different types of Bayesian prediction errors for regression with Gaussian processes. The basic bounds are formulated for a fixed training set. Simpler expressions are obtained for sampling from an input distribution which equals the weight function of the covariance kernel, yielding asymptotically tight results. The results are compared with numerical experiments.
Learning Lie Groups for Invariant Visual Perception
Rao, Rajesh P. N., Ruderman, Daniel L.
One of the most important problems in visual perception is that of visual invariance: how are objects perceived to be the same despite undergoing transformations such as translations, rotations or scaling? In this paper, we describe a Bayesian method for learning invariances based on Lie group theory. We show that previous approaches based on first-order Taylor series expansions of inputs can be regarded as special cases of the Lie group approach, the latter being capable of handling in principle arbitrarily large transfonnations. Using a matrixexponential based generative model of images, we derive an unsupervised algorithm for learning Lie group operators from input data containing infinitesimal transfonnations.
The Bias-Variance Tradeoff and the Randomized GACV
Wahba, Grace, Lin, Xiwu, Gao, Fangyu, Xiang, Dong, Klein, Ronald, Klein, Barbara
We propose a new in-sample cross validation based method (randomized GACV) for choosing smoothing or bandwidth parameters that govern the bias-variance or fit-complexity tradeoff in'soft' classification. Soft classification refers to a learning procedure which estimates the probability that an example with a given attribute vector is in class 1 vs class O. The target for optimizing the the tradeoff is the Kullback-Liebler distance between the estimated probability distribution and the'true' probability distribution, representing knowledge of an infinite population. The method uses a randomized estimate of the trace of a Hessian and mimics cross validation at the cost of a single relearning with perturbed outcome data.
Semiparametric Support Vector and Linear Programming Machines
Smola, Alex J., Frieß, Thilo-Thomas, Schölkopf, Bernhard
In fact, for many of the kernels used (not the polynomial kernels) like Gaussian rbf-kernels it can be shown [6] that SV machines are universal approximators. While this is advantageous in general, parametric models are useful techniques in their own right. Especially if one happens to have additional knowledge about the problem, it would be unwise not to take advantage of it. For instance it might be the case that the major properties of the data are described by a combination of a small set of linear independent basis functions {¢Jt (.),..., ¢n (.)}. Or one may want to correct the data for some (e.g.
Discovering Hidden Features with Gaussian Processes Regression
Vivarelli, Francesco, Williams, Christopher K. I.
W is often taken to be diagonal, but if we allow W to be a general positive definite matrix which can be tuned on the basis of training data, then an eigen-analysis of W shows that we are effectively creating hidden features, where the dimensionality of the hidden-feature space is determined by the data. We demonstrate the superiority of predictions using the general matrix over those based on a diagonal matrix on two test problems.
Information Maximization in Single Neurons
Stemmler, Martin, Koch, Christof
Information from the senses must be compressed into the limited range of firing rates generated by spiking nerve cells. Optimal compression uses all firing rates equally often, implying that the nerve cell's response matches the statistics of naturally occurring stimuli. Since changing the voltage-dependent ionic conductances in the cell membrane alters the flow of information, an unsupervised, non-Hebbian, developmental learning rule is derived to adapt the conductances in Hodgkin-Huxley model neurons. By maximizing the rate of information transmission, each firing rate within the model neuron's limited dynamic range is used equally often. An efficient neuronal representation of incoming sensory information should take advantage of the regularity and scale invariance of stimulus features in the natural world. In the case of vision, this regularity is reflected in the typical probabilities of encountering particular visual contrasts, spatial orientations, or colors [1]. Given these probabilities, an optimized neural code would eliminate any redundancy, while devoting increased representation to commonly encountered features. At the level of a single spiking neuron, information about a potentially large range of stimuli is compressed into a finite range of firing rates, since the maximum firing rate of a neuron is limited. Optimizing the information transmission through a single neuron in the presence of uniform, additive noise has an intuitive interpretation: the most efficient representation of the input uses every firing rate with equal probability.
Lazy Learning Meets the Recursive Least Squares Algorithm
Birattari, Mauro, Bontempi, Gianluca, Bersini, Hugues
Lazy learning is a memory-based technique that, once a query is received, extracts a prediction interpolating locally the neighboring examples of the query which are considered relevant according to a distance measure. In this paper we propose a data-driven method to select on a query-by-query basis the optimal number of neighbors to be considered for each prediction. As an efficient way to identify and validate local models, the recursive least squares algorithm is introduced in the context of local approximation and lazy learning. Furthermore, beside the winner-takes-all strategy for model selection, a local combination of the most promising models is explored. The method proposed is tested on six different datasets and compared with a state-of-the-art approach.
Optimizing Classifers for Imbalanced Training Sets
Karakoulas, Grigoris I., Shawe-Taylor, John
Following recent results [9, 8] showing the importance of the fatshattering dimension in explaining the beneficial effect of a large margin on generalization performance, the current paper investigates the implications of these results for the case of imbalanced datasets and develops two approaches to setting the threshold. The approaches are incorporated into ThetaBoost, a boosting algorithm for dealing with unequal loss functions. The performance of ThetaBoost and the two approaches are tested experimentally.