Optimal Depth Neural Networks for Multiplication and Related Problems
Siu, Kai-Yeung, Roychowdhury, Vwani
An artificial neural network (ANN) is commonly modeled by a threshold circuit, a network of interconnected processing units called linear threshold gates. The depth of a network represents the number of unit delays or the time for parallel computation. The SIze of a circuit is the number of gates and measures the amount of hardware. It was known that traditional logic circuits consisting of only unbounded fan-in AND, OR, NOT gates would require at least O(log n/log log n) depth to compute common arithmetic functions such as the product or the quotient of two n-bit numbers, unless we allow the size (and fan-in) to increase exponentially (in n). We show in this paper that ANNs can be much more powerful than traditional logic circuits.
Learning Cellular Automaton Dynamics with Neural Networks
We have trained networks of E - II units with short-range connections tosimulate simple cellular automata that exhibit complex or chaotic behaviour. Three levels of learning are possible (in decreasing orderof difficulty): learning the underlying automaton rule, learning asymptotic dynamical behaviour, and learning to extrapolate thetraining history. The levels of learning achieved with and without weight sharing for different automata provide new insight into their dynamics.
Analogy-- Watershed or Waterloo? Structural alignment and the development of connectionist models of analogy
Gentner, Dedre, Markman, Arthur B.
Neural network models have been criticized for their inability to make use of compositional representations. In this paper, we describe a series of psychological phenomena that demonstrate the role of structured representations in cognition. These findings suggest that people compare relational representations via a process of structural alignment. This process will have to be captured by any model of cognition, symbolic or subsymbolic.
History-Dependent Attractor Neural Networks
Meilijson, Isaac, Ruppin, Eytan
We present a methodological framework enabling a detailed description ofthe performance of Hopfield-like attractor neural networks (ANN) in the first two iterations. Using the Bayesian approach, wefind that performance is improved when a history-based term is included in the neuron's dynamics. A further enhancement of the network's performance is achieved by judiciously choosing the censored neurons (those which become active in a given iteration) onthe basis of the magnitude of their post-synaptic potentials. Thecontribution of biologically plausible, censored, historydependent dynamicsis especially marked in conditions of low firing activity and sparse connectivity, two important characteristics of the mammalian cortex. In such networks, the performance attained ishigher than the performance of two'independent' iterations, whichrepresents an upper bound on the performance of history-independent networks.
Deriving Receptive Fields Using an Optimal Encoding Criterion
In unsupervised network learning, the development of the connection weights is influenced by statistical properties of the ensemble of input vectors, rather than by the degree of mismatch between the network's output and some'desired' output. An implicit goal of such learning is that the network should transform the input so that salient features present in the input are represented at the output in a 953 954 Linsker more useful form. This is often done by reducing the input dimensionality in a way that preserves the high-variance components of the input (e.g., principal component analysis, Kohonen feature maps). The principle of maximum information preservation ('infomax') is an unsupervised learning strategy that states (Linsker 1988): From a set of allowed input-output mappings (e.g., parametrized by the connection weights), choose a mapping that maximizes the (ensemble-averaged) Shannon information that the output vector conveys about the input vector, in the presence of noise.
Neural Network Model Selection Using Asymptotic Jackknife Estimator and Cross-Validation Method
Two theorems and a lemma are presented about the use of jackknife estimator andthe cross-validation method for model selection. Theorem 1 gives the asymptotic form for the jackknife estimator. Combined with the model selection criterion, this asymptotic form can be used to obtain the fit of a model. The model selection criterion we used is the negative of the average predictive likehood, the choice of which is based on the idea of the cross-validation method. Lemma 1 provides a formula for further exploration ofthe asymptotics of the model selection criterion. Theorem 2 gives an asymptotic form of the model selection criterion for the regression case, when the parameters optimization criterion has a penalty term. Theorem 2 also proves the asymptotic equivalence of Moody's model selection criterion (Moody,1992) and the cross-validation method, when the distance measure between response y and regression function takes the form of a squared difference. 1 INTRODUCTION Selecting a model for a specified problem is the key to generalization based on the training data set.
Learning Control Under Extreme Uncertainty
A peg-in-hole insertion task is used as an example to illustrate the utility of direct associative reinforcement learning methods for learning control under real-world conditions of uncertainty and noise. Task complexity due to the use of an unchamfered hole and a clearance of less than 0.2mm is compounded by the presence of positional uncertainty of magnitude exceeding 10 to 50 times the clearance. Despite this extreme degree of uncertainty, our results indicate that direct reinforcement learning can be used to learn a robust reactive control strategy that results in skillful peg-in-hole insertions.
Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping
Moore, Andrew W., Atkeson, Christopher G.
We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classicalmethods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamicprogramming sweeps and to guide the exploration of statespace.