Country
On the Generalization Ability of On-Line Learning Algorithms
Cesa-bianchi, Nicolò, Conconi, Alex, Gentile, Claudio
In this paper we show that online algorithms for classification and regression canbe naturally used to obtain hypotheses with good datadependent tailbounds on their risk. Our results are proven without requiring complicated concentration-of-measure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
Sampling Techniques for Kernel Methods
Achlioptas, Dimitris, Mcsherry, Frank, Schölkopf, Bernhard
We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations. Ratherintriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a "randomized kernel" which behaves like in expectation.
Associative memory in realistic neuronal networks
Almost two decades ago, Hopfield [1] showed that networks of highly reduced model neurons can exhibit multiple attracting fixed points, thus providing a substrate for associative memory. It is still not clear, however, whether realistic neuronal networks can support multiple attractors. The main difficulty is that neuronal networks in vivo exhibit a stable background state at low firing rate, typically afew Hz. Embedding attractor is easy; doing so without destabilizing the background is not. Previous work [2, 3] focused on the sparse coding limit, in which a vanishingly small number of neurons are involved in any memory. Here we investigate the case in which the number of neurons involved in a memory scales with the number of neurons in the network.
Motivated Reinforcement Learning
Competition between actions is based on the motivating characteristics of their consequent states in this sense. Substantial, careful, experiments reviewed in Dickinson & Balleine,12,13 into the neurobiology and psychology ofmotivation shows that this view is incomplete. In many cases, animals are faced with the choice not between many different actionsat a given state, but rather whether a single response isworth executing at all. Evidence suggests that the motivational process underlying this choice has different psychological andneural properties from that underlying action choice. We describe and model these motivational systems, and consider the way they interact.
KLD-Sampling: Adaptive Particle Filters
Over the last years, particle filters have been applied with great success to a variety of state estimation problems. We present a statistical approach to increasing the efficiency of particle filters by adapting the size of sample sets on-the-fly. The key idea of the KLD-sampling method is to bound the approximation error introduced by the sample-based representation of the particle filter. The name KLD-sampling is due to the fact that we measure the approximation error by the Kullback-Leibler distance. Our adaptation approach chooses a small number of samples if the density is focused on a small part of the state space, and it chooses a large number of samples if the state uncertainty is high. Both the implementation and computation overhead of this approach are small. Extensive experiments using mobile robot localization as a test application show that our approach yields drastic improvements over particle filters with fixed sample set sizes and over a previously introduced adaptation technique.
K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms
Vincent, Pascal, Bengio, Yoshua
Guided by an initial idea of building a complex (non linear) decision surface with maximal local margin in input space, we give a possible geometrical intuition as to why K-Nearest Neighbor (KNN) algorithms often perform more poorly than SVMs on classification tasks. We then propose modified K-Nearest Neighbor algorithms to overcome the perceived problem.The approach is similar in spirit to Tangent Distance, but with invariances inferred from the local neighborhood rather than prior knowledge. Experimental results on real world classification tasks suggest thatthe modified KNN algorithms often give a dramatic improvement overstandard KNN and perform as well or better than SVMs.
Relative Density Nets: A New Way to Combine Backpropagation with HMM's
Brown, Andrew D., Hinton, Geoffrey E.
Hinton Gatsby Unit, UCL London, UK WCIN 3AR hinton@gatsby.ucl.ac.uk Abstract Logistic units in the first hidden layer of a feedforward neural network computethe relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1 Introduction A standard way of performing classification using a generative model is to divide the training cases into their respective classes and then train a set of class conditional models. This unsupervised approach to classification is appealing for two reasons.
Kernel Logistic Regression and the Import Vector Machine
The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an ongoing research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs aswell as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.