Information Technology
An Orientation Selective Neural Network for Pattern Identification in Particle Detectors
Abramowicz, Halina, Horn, David, Naftaly, Ury, Sahar-Pikielny, Carmit
Constructing amulti-layered neural network with fixed architecture which implements orientation selectivity, we define output elements corresponding todifferent orientations, which allow us to make a selection decision. The algorithm takes into account the granularity of the lattice as well as the presence of noise and inefficiencies. The method is applied to a sample of data collected with the ZEUS detector at HERA in order to identify cosmic muons that leave a linear pattern of signals in the segmented calorimeter. A two dimensional representation of the relevant part of the detector is used. The algorithm performs very well. Given its architecture, this system becomes a good candidate for fast pattern recognition in parallel processing devices.
Rapid Visual Processing using Spike Asynchrony
Thorpe, Simon J., Gautrais, Jacques
We have investigated the possibility that rapid processing in the visual system could be achieved by using the order of firing in different neurones as a code, rather than more conventional firing rate schemes. Using SPIKENET, a neural net simulator based on integrate-and-fire neurones and in which neurones in the input layer function as analogto-delay converters,we have modeled the initial stages of visual processing. Initial results are extremely promising. Even with activity in retinal output cells limited to one spike per neuron per image (effectively ruling out any form of rate coding), sophisticated processing based on asynchronous activation was nonetheless possible.
Removing Noise in On-Line Search using Adaptive Batch Sizes
Stochastic (online) learning can be faster than batch learning. However, at late times, the learning rate must be annealed to remove thenoise present in the stochastic weight updates. In this annealing phase, the convergence rate (in mean square) is at best proportional to l/T where T is the number of input presentations. An alternative is to increase the batch size to remove the noise. In this paper we explore convergence for LMS using 1) small but fixed batch sizes and 2) an adaptive batch size. We show that the best adaptive batch schedule is exponential and has a rate of convergence whichis the same as for annealing, Le., at best proportional to l/T. 1 Introduction Stochastic (online) learning can speed learning over its batch training particularly,,,,hen data sets are large and contain redundant information [M0l93J. However, at late times in learning, noise present in the weight updates prevents complete convergence fromtaking place. To reduce the noise, the learning rate is slowly decreased (annealed{ at late times. The optimal annealing schedule is asymptotically proportional toT where t is the iteration [GoI87, L093, Orr95J.
Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning
Model learning combined with dynamic programming has been shown to be effective for learning control of continuous state dynamic systems. The simplest method assumes the learned model is correct and applies dynamic programming to it, but many approximators provide uncertainty estimates on the fit. How can they be exploited? This paper addresses the case where the system must be prevented from having catastrophic failures during learning.We propose a new algorithm adapted from the dual control literature and use Bayesian locally weighted regression models with dynamic programming.A common reinforcement learning assumption is that aggressive exploration should be encouraged. This paper addresses the converse casein which the system has to reign in exploration.
Unification of Information Maximization and Minimization
In the present paper, we propose a method to unify information maximization and minimization in hidden units. The information maximization and minimization are performed on two different levels: collectiveand individual level. Thus, two kinds of information: collective and individual information are defined. By maximizing collective information and by minimizing individual information, simple networks can be generated in terms of the number of connections andthe number of hidden units. Obtained networks are expected to give better generalization and improved interpretation of internal representations.
Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo
Barber, David, Williams, Christopher K. I.
The full Bayesian method for applying neural networks to a prediction problemis to set up the prior/hyperprior structure for the net and then perform the necessary integrals. However, these integrals arenot tractable analytically, and Markov Chain Monte Carlo (MCMC) methods are slow, especially if the parameter space is high-dimensional. Using Gaussian processes we can approximate the weight space integral analytically, so that only a small number of hyperparameters need be integrated over by MCMC methods. We have applied this idea to classification problems, obtaining excellent resultson the real-world problems investigated so far. 1 INTRODUCTION To make predictions based on a set of training data, fundamentally we need to combine our prior beliefs about possible predictive functions with the data at hand. In the Bayesian approach to neural networks a prior on the weights in the net induces a prior distribution over functions.
NeuroScale: Novel Topographic Feature Extraction using RBF Networks
Lowe, David, Tipping, Michael E.
Further details may be found in (Lowe, 1993; Lowe and Tipping, 1996). We seek a dimension-reducing, topographic transformation of data for the purposes of visualisation and analysis. By'topographic', we imply that the geometric structure of the data be optimally preserved in the transformation, and the embodiment of this constraint is that the inter-point distances in the feature space should correspond as closely as possible to those distances in the data space. The implementation of this principle by a neural network is very simple. A Radial Basis Function (RBF) neural network is utilised to predict the coordinates of the data point in the transformed feature space. The locations of the feature points are indirectly determined by adjusting the weights of the network. The transformation is determined by optimising the network parameters in order to minimise a suitable error measure that embodies the topographic principle. The specific details of this alternative approach are as follows.
Interpreting Images by Propagating Bayesian Beliefs
A central theme of computational vision research has been the realization thatreliable estimation of local scene properties requires propagating measurements across the image. Many authors have therefore suggested solving vision problems using architectures of locally connected units updating their activity in parallel. Unfortunately, theconvergence of traditional relaxation methods on such architectures has proven to be excruciatingly slow and in general they do not guarantee that the stable point will be a global minimum. In this paper we show that an architecture in which Bayesian Beliefs aboutimage properties are propagated between neighboring units yields convergence times which are several orders of magnitude fasterthan traditional methods and avoids local minima. In particular our architecture is non-iterative in the sense of Marr [5]: at every time step, the local estimates at a given location are optimal giventhe information which has already been propagated to that location.
A Convergence Proof for the Softassign Quadratic Assignment Algorithm
Rangarajan, Anand, Yuille, Alan L., Gold, Steven, Mjolsness, Eric
The softassign quadratic assignment algorithm has recently emerged as an effective strategy for a variety of optimization problems inpattern recognition and combinatorial optimization. While the effectiveness of the algorithm was demonstrated in thousands of simulations, there was no known proof of convergence. Here, we provide a proof of convergence for the most general form of the algorithm.