Goto

Collaborating Authors

 North America


Bayesian Self-Organization

Neural Information Processing Systems

Smirnakis Lyman Laboratory of Physics Harvard University Cambridge, MA 02138 Lei Xu * Dept. of Computer Science HSH ENG BLDG, Room 1006 The Chinese University of Hong Kong Shatin, NT Hong Kong Abstract Recent work by Becker and Hinton (Becker and Hinton, 1992) shows a promising mechanism, based on maximizing mutual information assumingspatial coherence, by which a system can selforganize itself to learn visual abilities such as binocular stereo. We introduce a more general criterion, based on Bayesian probability theory, and thereby demonstrate a connection to Bayesian theories ofvisual perception and to other organization principles for early vision (Atick and Redlich, 1990). Methods for implementation usingvariants of stochastic learning are described and, for the special case of linear filtering, we derive an analytic expression for the output. 1 Introduction The input intensity patterns received by the human visual system are typically complicated functions of the object surfaces and light sources in the world. It *Lei Xu was a research scholar in the Division of Applied Sciences at Harvard University while this work was performed. Thus the visual system must be able to extract information from the input intensities that is relatively independent of the actual intensity values.


How to Describe Neuronal Activity: Spikes, Rates, or Assemblies?

Neural Information Processing Systems

What is the'correct' theoretical description of neuronal activity? The analysis of the dynamics of a globally connected network of spiking neurons (the Spike Response Model) shows that a description bymean firing rates is possible only if active neurons fire incoherently. Iffiring occurs coherently or with spatiotemporal correlations, the spike structure of the neural code becomes relevant. Alternatively, neurons can be gathered into local or distributed ensembles or'assemblies'. A description based on the mean ensemble activity is, in principle, possible but the interaction between different assembliesbecomes highly nonlinear. A description with spikes should therefore be preferred.


A Comparison of Dynamic Reposing and Tangent Distance for Drug Activity Prediction

Neural Information Processing Systems

Thomas G. Dietterich Arris Pharmaceutical Corporation and Oregon State University Corvallis, OR 97331-3202 Ajay N. Jain Arris Pharmaceutical Corporation 385 Oyster Point Blvd., Suite 3 South San Francisco, CA 94080 Richard H. Lathrop and Tomas Lozano-Perez Arris Pharmaceutical Corporation and MIT Artificial Intelligence Laboratory 545 Technology Square Cambridge, MA 02139 Abstract In drug activity prediction (as in handwritten character recognition), thefeatures extracted to describe a training example depend on the pose (location, orientation, etc.) of the example. In handwritten characterrecognition, one of the best techniques for addressing thisproblem is the tangent distance method of Simard, LeCun and Denker (1993). Jain, et al. (1993a; 1993b) introduce a new technique-dynamic reposing-that also addresses this problem. Dynamicreposing iteratively learns a neural network and then reposes the examples in an effort to maximize the predicted output values.New models are trained and new poses computed until models and poses converge. This paper compares dynamic reposing to the tangent distance method on the task of predicting the biological activityof musk compounds.


Recognition-based Segmentation of On-Line Cursive Handwriting

Neural Information Processing Systems

This paper introduces a new recognition-based segmentation approach torecognizing online cursive handwriting from a database of 10,000 English words. The original input stream of z, y pen coordinates isencoded as a sequence of uniform stroke descriptions that are processed by six feed-forward neural-networks, each designed to recognize letters of different sizes. Words are then recognized by performing best-first search over the space of all possible segmentations. Resultsdemonstrate that the method is effective at both writer dependent recognition (1.7% to 15.5% error rate) and writer independent recognition (5.2% to 31.1% error rate). 1 Introduction With the advent of pen-based computers, the problem of automatically recognizing handwriting from the motions of a pen has gained much significance. Progress has been made in reading disjoint block letters [Weissman et.



Cross-Validation Estimates IMSE

Neural Information Processing Systems

Integrated Mean Squared Error (IMSE) is a version of the usual mean squared error criterion, averaged over all possible training sets of a given size. If it could be observed, it could be used to determine optimal network complexity or optimal data subsets forefficient training. We show that two common methods of cross-validating average squared error deliver unbiased estimates of IMSE, converging to IMSE with probability one. We also show that two variants of cross validation measure provide unbiased IMSE-based estimates potentially useful for selecting optimal data subsets. 1 Summary To begin, assume we are given a fixed network architecture. Let zN denote a given set of N training examples.


Coupled Dynamics of Fast Neurons and Slow Interactions

Neural Information Processing Systems

A.C.C. Coolen R.W. Penney D. Sherrington Dept. of Physics - Theoretical Physics University of Oxford 1 Keble Road, Oxford OXI 3NP, U.K. Abstract A simple model of coupled dynamics of fast neurons and slow interactions, modellingself-organization in recurrent neural networks, leads naturally to an effective statistical mechanics characterized by a partition function which is an average over a replicated system. This is reminiscent of the replica trick used to study spin-glasses, but with the difference that the number of replicas has a physical meaningas the ratio of two temperatures and can be varied throughout the whole range of real values. The model has interesting phaseconsequences as a function of varying this ratio and external stimuli, and can be extended to a range of other models. 1 A SIMPLE MODEL WITH FAST DYNAMIC NEURONS AND SLOW DYNAMIC INTERACTIONS As the basic archetypal model we consider a system of Ising spin neurons (J'i E {-I, I}, i E {I, ..., N}, interacting via continuous-valued symmetric interactions, Iij, which themselves evolve in response to the states of the neurons. The neurons are taken to have a stochastic field-alignment dynamics which is fast compared with the evolution rate of the interactions hj, such that on the timescale of Iii-dynamics the neurons are effectively in equilibrium according to a Boltzmann distribution, (1) 447 448 Cooien, Penney, and Sherrington where HVoj}({O"d) JijO"iO"j (2) i j and the subscript {Jij} indicates that the {Jij} are to be considered as quenched variables. In practice, several specific types of dynamics which obey detailed balance lead to the equilibrium distribution (1), such as a Markov process with single-spin flip Glauber dynamics [1]. The second term acts to limit the magnitude of hj; f3 is the characteristic inverse temperature of the interaction system. VNTJij(t) (4) where the effective Hamiltonian 11. ({ hj}) is given by 1 1 We now recognise (4) as having the form of a Langevin equation, so that the equilibrium distributionof the interaction system is given by a Boltzmann form. Z{3 (6) Coupled Dynamics of Fast Neurons and Slow Interactions 449 where n _ /j3. We may use Z as a generating functional to produce thermodynamic averagesof state variables I ( {O"d; {Jij}) in the combined system by adding suitable infinitesimal source terms to the neuron Hamiltonian (2): HP.j}({O"d) In fact, any real n is possible by tuning the ratio between the two {3's. In the formulation presented in this paper n is always nonnegative, but negative values are possible if the Hebbian rule of (3) is replaced by an anti-Hebbian form with (UiO"j) replaced by - (O"iO"j) (the case of negative n is being studied by Mezard and coworkers [7]).


Resolving motion ambiguities

Neural Information Processing Systems

We address the problem of optical flow reconstruction and in particular theproblem of resolving ambiguities near edges. They occur due to (i) the aperture problem and (ii) the occlusion problem, where pixels on both sides of an intensity edge are assigned the same velocity estimates (and confidence). However, these measurements are correct for just one side of the edge (the non occluded one). Our approach is to introduce an uncertamty field with respect to the estimates and confidence measures. We note that the confidence measuresare large at intensity edges and larger at the convex sides of the edges, i.e. inside corners, than at the concave side. We resolve the ambiguities through local interactions via coupled Markov random fields (MRF). The result is the detection of motion for regions of images with large global convexity.


Autoencoders, Minimum Description Length and Helmholtz Free Energy

Neural Information Processing Systems

An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distribution, wherethe generative weights define the energy of each possible code vector given the input vector. Unfortunately, if the code vectors use distributed representations, it is exponentially expensive to compute this Boltzmann distribution because it involves all possible code vectors. We show that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation givesan upper bound on the description length. Even when this bound is poor, it can be used as a Lyapunov function for learning both the generative and the recognition weights. We demonstrate that this approach can be used to learn factorial codes.


Dual Mechanisms for Neural Binding and Segmentation

Neural Information Processing Systems

We propose that the binding and segmentation of visual features is mediated by two complementary mechanisms; a low resolution, spatial-based,resource-free process and a high resolution, temporal-based, resource-limited process. In the visual cortex, the former depends upon the orderly topographic organization in striate andextrastriate areas while the latter may be related to observed temporalrelationships between neuronal activities. Computer simulations illustrate the role the two mechanisms play in figure/ ground discrimination, depth-from-occlusion, and the vividness ofperceptual completion. 1 COMPLEMENTARY BINDING MECHANISMS The "binding problem" is a classic problem in computational neuroscience which considers how neuronal activities are grouped to create mental representations. For the case of visual processing, the binding of neuronal activities requires a mechanism forselectively grouping fragmented visual features in order to construct the coherent representations (i.e.