Goto

Collaborating Authors

 Perceptrons


Analytical Study of the Interplay between Architecture and Predictability

Neural Information Processing Systems

We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights. We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can generate higher dimensional attractors. In the case of a perceptron, we develop the stationary solution for general weights, and show that the flow is typically one dimensional.


Regularisation in Sequential Learning Algorithms

Neural Information Processing Systems

In this paper, we discuss regularisation in online/sequential learning algorithms. In environments where data arrives sequentially, techniques such as cross-validation to achieve regularisation or model selection are not possible. Further, bootstrapping to determine a confidence level is not practical. To surmount these problems, a minimum variance estimation approach that makes use of the extended Kalman algorithm for training multi-layer perceptrons is employed. The novel contribution of this paper is to show the theoretical links between extended Kalman filtering, Sutton's variable learning rate algorithms and Mackay's Bayesian estimation framework. In doing so, we propose algorithms to overcome the need for heuristic choices of the initial conditions and noise covariance matrices in the Kalman approach.


The Efficiency and the Robustness of Natural Gradient Descent Learning Rule

Neural Information Processing Systems

The inverse of the Fisher information matrix is used in the natural gradient descent algorithm to train single-layer and multi-layer perceptrons. We have discovered a new scheme to represent the Fisher information matrix of a stochastic multi-layer perceptron. Based on this scheme, we have designed an algorithm to compute the natural gradient. When the input dimension n is much larger than the number of hidden neurons, the complexity of this algorithm is of order O(n). It is confirmed by simulations that the natural gradient descent learning rule is not only efficient but also robust.


The Storage Capacity of a Fully-Connected Committee Machine

Neural Information Processing Systems

We study the storage capacity of a fully-connected committee machine with a large number K of hidden nodes. The storage capacity is obtained by analyzing the geometrical structure of the weight space related to the internal representation.


Data-Dependent Structural Risk Minimization for Perceptron Decision Trees

Neural Information Processing Systems

This paper presents a neural-model of pre-attentive visual processing. The model explains why certain displays can be processed very fast, "in parallel", while others require slower, "serial" processing, in subsequent attentional systems. Our approach stems from the observation that the visual environment is overflowing with diverse information, but the biological information-processing systems analyzing it have a limited capacity [1]. This apparent mismatch suggests that data compression should be performed at an early stage of perception, and that via an accompanying process of dimension reduction, only a few essential features of the visual display should be retained. We propose that only parallel displays incorporate global features that enable fast target detection, and hence they can be processed pre-attentively, with all items (target and dis tractors) examined at once.


Analytical Study of the Interplay between Architecture and Predictability

Neural Information Processing Systems

We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights. We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can generate higher dimensional attractors. In the case of a perceptron, we develop the stationary solution for general weights, and show that the flow is typically one dimensional.


Analytical Study of the Interplay between Architecture and Predictability

Neural Information Processing Systems

We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights. We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can attractors. In the case of a perceptron,generate higher dimensional the stationary solution for general weights, and showwe develop that the flow is typically one dimensional.


The Storage Capacity of a Fully-Connected Committee Machine

Neural Information Processing Systems

We study the storage capacity of a fully-connected committee machine witha large number K of hidden nodes. The storage capacity is obtained by analyzing the geometrical structure of the weight space related to the internal representation.


Use of a Multi-Layer Perceptron to Predict Malignancy in Ovarian Tumors

Neural Information Processing Systems

Here we sudy the continuous time, continuous state-spacestochastic case, which covers a wide variety of control problems including target, viability, optimization problems (see [FS93], [KP95])}or which a formalism is the following.


Regularisation in Sequential Learning Algorithms

Neural Information Processing Systems

In this paper, we discuss regularisation in online/sequential learning algorithms.In environments where data arrives sequentially, techniques such as cross-validation to achieve regularisation or model selection are not possible. Further, bootstrapping to determine aconfidence level is not practical. To surmount these problems, a minimum variance estimation approach that makes use of the extended Kalman algorithm for training multi-layer perceptrons isemployed. The novel contribution of this paper is to show the theoretical links between extended Kalman filtering, Sutton's variable learning rate algorithms and Mackay's Bayesian estimation framework.In doing so, we propose algorithms to overcome the need for heuristic choices of the initial conditions and noise covariance matrices in the Kalman approach.