Country
Bayesian Map Learning in Dynamic Environments
We consider the problem of learning a grid-based map using a robot with noisy sensors and actuators. We compare two approaches: online EM, where the map is treated as a fixed parameter, and Bayesian inference, where the map is a (matrix-valued) random variable. We show that even on a very simple example, online EM can get stuck in local minima, which causes the robot to get "lost" and the resulting map to be useless. By contrast, the Bayesian approach, by maintaining multiple hypotheses, is much more robust. Wethen introduce a method for approximating the Bayesian solution, called Rao-Blackwellised particle filtering. We show that this approximation, when coupled with an active learning strategy, is fast but accurate.
Better Generative Models for Sequential Data Problems: Bidirectional Recurrent Mixture Density Networks
This paper describes bidirectional recurrent mixture density networks, whichcan model multi-modal distributions of the type P(Xt Iyf) and P(Xt lXI, X2, ...,Xt-l, yf) without any explicit assumptions aboutthe use of context. These expressions occur frequently in pattern recognition problems with sequential data, for example in speech recognition. Experiments show that the proposed generativemodels give a higher likelihood on test data compared toa traditional modeling approach, indicating that they can summarize the statistical properties of the data better. 1 Introduction Many problems of engineering interest can be formulated as sequential data problems inan abstract sense as supervised learning from sequential data, where an input vector (dimensionality D) sequence X xf {X!,X2, .. .
Maximum Entropy Discrimination
Jaakkola, Tommi, Meila, Marina, Jebara, Tony
We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involvedistributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under thisclass and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class-conditional models within this framework.
Regular and Irregular Gallager-zype Error-Correcting Codes
Kabashima, Yoshiyuki, Murayama, Tatsuto, Saad, David, Vicente, Renato
The performance of regular and irregular Gallager-type errorcorrecting codeis investigated via methods of statistical physics. The transmitted codeword comprises products of the original message bitsselected by two randomly-constructed sparse matrices; the number of nonzero row/column elements in these matrices constitutes a family of codes. We show that Shannon's channel capacity may be saturated in equilibrium for many of the regular codes while slightly lower performance is obtained for others which may be of higher practical relevance. Decoding aspects are considered byemploying the TAP approach which is identical to the commonly used belief-propagation-based decoding. We show that irregular codes may saturate Shannon's capacity but with improved dynamical properties. 1 Introduction The ever increasing information transmission in the modern world is based on reliably communicatingmessages through noisy transmission channels; these can be telephone lines, deep space, magnetic storing media etc. Error-correcting codes play a significant role in correcting errors incurred during transmission; this is carried out by encoding the message prior to transmission and decoding the corrupted received code-word for retrieving the original message.
Hierarchical Image Probability (H1P) Models
We formulate a model for probability distributions on image spaces. We show that any distribution of images can be factored exactly into conditional distributionsof feature vectors at one resolution (pyramid level) conditioned on the image information at lower resolutions. We would like to factor this over positions in the pyramid levels to make it tractable, but such factoring may miss long-range dependencies. To fix this, we introduce hiddenclass labels at each pixel in the pyramid. The result is a hierarchical mixture of conditional probabilities, similar to a hidden Markov model on a tree. The model parameters can be found with maximum likelihoodestimation using the EM algorithm. We have obtained encouraging preliminary results on the problems of detecting various objects inSAR images and target recognition in optical aerial images. 1 Introduction
A Geometric Interpretation of v-SVM Classifiers
Crisp, David J., Burges, Christopher J. C.
We show that the recently proposed variant of the Support Vector machine (SVM) algorithm, known as v-SVM, can be interpreted as a maximal separation between subsets of the convex hulls of the data, which we call soft convex hulls. The soft convex hulls are controlled by choice of the parameter v. If the intersection of the convex hulls is empty, the hyperplane is positioned halfway between them such that the distance between convex hulls, measured along the normal, is maximized; and if it is not, the hyperplane's normal is similarly determined by the soft convex hulls, but its position (perpendicular distance from the origin) is adjusted to minimize the error sum. The proposed geometric interpretation of v-SVM also leads to necessary and sufficient conditions for the existence of a choice of v for which the v-SVM solution is nontrivial. 1 Introduction Recently, SchOlkopf et al. [I) introduced a new class of SVM algorithms, called v-SVM, for both regression estimation and pattern recognition. The basic idea is to remove the user-chosen error penalty factor C that appears in SVM algorithms by introducing a new variable p which, in the pattern recognition case, adds another degree of freedom to the margin.
Acquisition in Autoshaping
However, most models have simply ignored these data; the few that have attempted toaddress them have failed by at least an order of magnitude. We discuss key data on the speed of acquisition, and show how to account for them using a statistically sound model of learning, in which differential reliabilities of stimuli playa crucial role. 1 Introduction Conditioning experiments probe the ways that animals make predictions about rewards and punishments and how those predictions are used to their advantage. Substantial quantitative data are available as to how pigeons and rats acquire conditioned responsesduring autoshaping, which is one of the simplest paradigms of classical conditioning.
Channel Noise in Excitable Neural Membranes
Manwani, Amit, Steinmetz, Peter N., Koch, Christof
Stochastic fluctuations of voltage-gated ion channels generate current and voltage noise in neuronal membranes. This noise may be a critical determinantof the efficacy of information processing within neural systems. Using Monte-Carlo simulations, we carry out a systematic investigation ofthe relationship between channel kinetics and the resulting membrane voltage noise using a stochastic Markov version of the Mainen-Sejnowski model of dendritic excitability in cortical neurons. Our simulations show that kinetic parameters which lead to an increase in membrane excitability (increasing channel densities, decreasing temperature) alsolead to an increase in the magnitude of the sub-threshold voltage noise. Noise also increases as the membrane is depolarized from rest towards threshold. This suggests that channel fluctuations may interfere witha neuron's ability to function as an integrator of its synaptic inputs and may limit the reliability and precision of neural information processing.
Image Recognition in Context: Application to Microscopic Urinalysis
Song, Xubo B., Sill, Joseph, Abu-Mostafa, Yaser S., Kasdan, Harvey
We propose a new and efficient technique for incorporating contextual information into object classification. Most of the current techniques face the problem of exponential computation cost. In this paper, we propose a new general framework that incorporates partial context at a linear cost. This technique is applied to microscopic urinalysis image recognition, resulting in a significant improvement of recognition rate over the context free approach. This gain would have been impossible using conventional context incorporation techniques.