Perceptrons
Analytical Study of the Interplay between Architecture and Predictability
Priel, Avner, Kanter, Ido, Kessler, David A.
We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights. We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can generate higher dimensional attractors. In the case of a perceptron, we develop the stationary solution for general weights, and show that the flow is typically one dimensional.
Regularisation in Sequential Learning Algorithms
Freitas, João F. G. de, Niranjan, Mahesan, Gee, Andrew H.
In this paper, we discuss regularisation in online/sequential learning algorithms. In environments where data arrives sequentially, techniques such as cross-validation to achieve regularisation or model selection are not possible. Further, bootstrapping to determine a confidence level is not practical. To surmount these problems, a minimum variance estimation approach that makes use of the extended Kalman algorithm for training multi-layer perceptrons is employed. The novel contribution of this paper is to show the theoretical links between extended Kalman filtering, Sutton's variable learning rate algorithms and Mackay's Bayesian estimation framework. In doing so, we propose algorithms to overcome the need for heuristic choices of the initial conditions and noise covariance matrices in the Kalman approach.
The Efficiency and the Robustness of Natural Gradient Descent Learning Rule
Yang, Howard Hua, Amari, Shun-ichi
The inverse of the Fisher information matrix is used in the natural gradient descent algorithm to train single-layer and multi-layer perceptrons. We have discovered a new scheme to represent the Fisher information matrix of a stochastic multi-layer perceptron. Based on this scheme, we have designed an algorithm to compute the natural gradient. When the input dimension n is much larger than the number of hidden neurons, the complexity of this algorithm is of order O(n). It is confirmed by simulations that the natural gradient descent learning rule is not only efficient but also robust.
Data-Dependent Structural Risk Minimization for Perceptron Decision Trees
Shawe-Taylor, John, Cristianini, Nello
This paper presents a neural-model of pre-attentive visual processing. The model explains why certain displays can be processed very fast, "in parallel", while others require slower, "serial" processing, in subsequent attentional systems. Our approach stems from the observation that the visual environment is overflowing with diverse information, but the biological information-processing systems analyzing it have a limited capacity [1]. This apparent mismatch suggests that data compression should be performed at an early stage of perception, and that via an accompanying process of dimension reduction, only a few essential features of the visual display should be retained. We propose that only parallel displays incorporate global features that enable fast target detection, and hence they can be processed pre-attentively, with all items (target and dis tractors) examined at once.
Analytical Study of the Interplay between Architecture and Predictability
Priel, Avner, Kanter, Ido, Kessler, David A.
We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights. We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can generate higher dimensional attractors. In the case of a perceptron, we develop the stationary solution for general weights, and show that the flow is typically one dimensional.
Analytical Study of the Interplay between Architecture and Predictability
Priel, Avner, Kanter, Ido, Kessler, David A.
We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights. We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can attractors. In the case of a perceptron,generate higher dimensional the stationary solution for general weights, and showwe develop that the flow is typically one dimensional.
Regularisation in Sequential Learning Algorithms
Freitas, João F. G. de, Niranjan, Mahesan, Gee, Andrew H.
In this paper, we discuss regularisation in online/sequential learning algorithms.In environments where data arrives sequentially, techniques such as cross-validation to achieve regularisation or model selection are not possible. Further, bootstrapping to determine aconfidence level is not practical. To surmount these problems, a minimum variance estimation approach that makes use of the extended Kalman algorithm for training multi-layer perceptrons isemployed. The novel contribution of this paper is to show the theoretical links between extended Kalman filtering, Sutton's variable learning rate algorithms and Mackay's Bayesian estimation framework.In doing so, we propose algorithms to overcome the need for heuristic choices of the initial conditions and noise covariance matrices in the Kalman approach.