Not enough data to create a plot.
Try a different view from the menu above.
Technology
Training Algorithms for Hidden Markov Models using Entropy Based Distance Functions
Singer, Yoram, Warmuth, Manfred K.
By adapting a framework used for supervised learning, we construct iterative algorithms that maximize the likelihood of the observations while also attempting to stay "close" to the current estimated parameters. We use a bound on the relative entropy between the two HMMs as a distance measure betweenthem. The result is new iterative training algorithms which are similar to the EM (Baum-Welch) algorithm for training HMMs. The proposed algorithms are composed of a step similar to the expectation step of Baum-Welch and a new update of the parameters which replaces the maximization (re-estimation) step. The algorithm takes only negligibly moretime per iteration and an approximated version uses the same expectation step as Baum-Welch.
Complex-Cell Responses Derived from Center-Surround Inputs: The Surprising Power of Intradendritic Computation
Mel, Bartlett W., Ruderman, Daniel L., Archie, Kevin A.
Biophysical modeling studies have previously shown that cortical pyramidal cells driven by strong NMDA-type synaptic currents and/or containing dendritic voltage-dependent Ca or Na channels, respondmore strongly when synapses are activated in several spatially clustered groups of optimal size-in comparison to the same number of synapses activated diffusely about the dendritic arbor [8]- The nonlinear intradendritic interactions giving rise to this "cluster sensitivity" property are akin to a layer of virtual nonlinear "hiddenunits" in the dendrites, with implications for the cellular basis of learning and memory [7, 6], and for certain classes of nonlinear sensory processing [8]- In the present study, we show that a single neuron, with access only to excitatory inputs from unoriented ONand OFFcenter cells in the LGN, exhibits the principal nonlinear response properties of a "complex" cell in primary visual cortex, namely orientation tuning coupled with translation invariance andcontrast insensitivity_ We conjecture that this type of intradendritic processing could explain how complex cell responses can persist in the absence of oriented simple cell input [13]- 84 B. W. Mel, D. L. Ruderman and K. A. Archie
Noisy Spiking Neurons with Temporal Coding have more Computational Power than Sigmoidal Neurons
Furthermore it is shown that networks of noisy spiking neurons with temporal coding have a strictly larger computational power than sigmoidal neural nets with the same number of units. 1 Introduction and Definitions We consider a formal model SNN for a ยงpiking neuron network that is basically a reformulation of the spike response model (and of the leaky integrate and fire model) without using 6-functions (see [Maass, 1996a] or [Maass, 1996b] for further backgrou nd).
An Orientation Selective Neural Network for Pattern Identification in Particle Detectors
Abramowicz, Halina, Horn, David, Naftaly, Ury, Sahar-Pikielny, Carmit
Constructing amulti-layered neural network with fixed architecture which implements orientation selectivity, we define output elements corresponding todifferent orientations, which allow us to make a selection decision. The algorithm takes into account the granularity of the lattice as well as the presence of noise and inefficiencies. The method is applied to a sample of data collected with the ZEUS detector at HERA in order to identify cosmic muons that leave a linear pattern of signals in the segmented calorimeter. A two dimensional representation of the relevant part of the detector is used. The algorithm performs very well. Given its architecture, this system becomes a good candidate for fast pattern recognition in parallel processing devices.
Rapid Visual Processing using Spike Asynchrony
Thorpe, Simon J., Gautrais, Jacques
We have investigated the possibility that rapid processing in the visual system could be achieved by using the order of firing in different neurones as a code, rather than more conventional firing rate schemes. Using SPIKENET, a neural net simulator based on integrate-and-fire neurones and in which neurones in the input layer function as analogto-delay converters,we have modeled the initial stages of visual processing. Initial results are extremely promising. Even with activity in retinal output cells limited to one spike per neuron per image (effectively ruling out any form of rate coding), sophisticated processing based on asynchronous activation was nonetheless possible.
Removing Noise in On-Line Search using Adaptive Batch Sizes
Stochastic (online) learning can be faster than batch learning. However, at late times, the learning rate must be annealed to remove thenoise present in the stochastic weight updates. In this annealing phase, the convergence rate (in mean square) is at best proportional to l/T where T is the number of input presentations. An alternative is to increase the batch size to remove the noise. In this paper we explore convergence for LMS using 1) small but fixed batch sizes and 2) an adaptive batch size. We show that the best adaptive batch schedule is exponential and has a rate of convergence whichis the same as for annealing, Le., at best proportional to l/T. 1 Introduction Stochastic (online) learning can speed learning over its batch training particularly,,,,hen data sets are large and contain redundant information [M0l93J. However, at late times in learning, noise present in the weight updates prevents complete convergence fromtaking place. To reduce the noise, the learning rate is slowly decreased (annealed{ at late times. The optimal annealing schedule is asymptotically proportional toT where t is the iteration [GoI87, L093, Orr95J.
Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning
Model learning combined with dynamic programming has been shown to be effective for learning control of continuous state dynamic systems. The simplest method assumes the learned model is correct and applies dynamic programming to it, but many approximators provide uncertainty estimates on the fit. How can they be exploited? This paper addresses the case where the system must be prevented from having catastrophic failures during learning.We propose a new algorithm adapted from the dual control literature and use Bayesian locally weighted regression models with dynamic programming.A common reinforcement learning assumption is that aggressive exploration should be encouraged. This paper addresses the converse casein which the system has to reign in exploration.
Unification of Information Maximization and Minimization
In the present paper, we propose a method to unify information maximization and minimization in hidden units. The information maximization and minimization are performed on two different levels: collectiveand individual level. Thus, two kinds of information: collective and individual information are defined. By maximizing collective information and by minimizing individual information, simple networks can be generated in terms of the number of connections andthe number of hidden units. Obtained networks are expected to give better generalization and improved interpretation of internal representations.
Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo
Barber, David, Williams, Christopher K. I.
The full Bayesian method for applying neural networks to a prediction problemis to set up the prior/hyperprior structure for the net and then perform the necessary integrals. However, these integrals arenot tractable analytically, and Markov Chain Monte Carlo (MCMC) methods are slow, especially if the parameter space is high-dimensional. Using Gaussian processes we can approximate the weight space integral analytically, so that only a small number of hyperparameters need be integrated over by MCMC methods. We have applied this idea to classification problems, obtaining excellent resultson the real-world problems investigated so far. 1 INTRODUCTION To make predictions based on a set of training data, fundamentally we need to combine our prior beliefs about possible predictive functions with the data at hand. In the Bayesian approach to neural networks a prior on the weights in the net induces a prior distribution over functions.