Asia
An Optimization Method of Layered Neural Networks based on the Modified Information Criterion
This paper proposes a practical optimization method for layered neural networks, by which the optimal model and parameter can be found simultaneously. 'i\Te modify the conventional information criterion into a differentiable function of parameters, and then, minimize it, while controlling it back to the ordinary form. Effectiveness of this method is discussed theoretically and experimentally.
Learning in Compositional Hierarchies: Inducing the Structure of Objects from Data
Model-based object recognition solves the problem of invariant recognition by relying on stored prototypes at unit scale positioned at the origin of an object-centered coordinate system. Elastic matching techniques are used to find a correspondence between features of the stored model and the data and can also compute the parameters of the transformation the observed instance has undergone relative to the stored model.
Constructive Learning Using Internal Representation Conflicts
Leerink, Laurens R., Jabri, Marwan A.
The first class of network adaptation algorithms start out with a redundant architecture and proceed by pruning away seemingly unimportant weights (Sietsma and Dow, 1988; Le Cun et aI, 1990). A second class of algorithms starts off with a sparse architecture and grows the network to the complexity required by the problem. Several algorithms have been proposed for growing feedforward networks. The upstart algorithm of Frean (1990) and the cascade-correlation algorithm of Fahlman (1990) are examples of this approach.
A Comparative Study of a Modified Bumptree Neural Network with Radial Basis Function Networks and the Standard Multi Layer Perceptron
Bostock, Richard T. J., Harget, Alan J.
Bumptrees are geometric data structures introduced by Omohundro (1991) to provide efficient access to a collection of functions on a Euclidean space of interest. We describe a modified bumptree structure that has been employed as a neural network classifier, and compare its performance on several classification tasks against that of radial basis function networks and the standard mutIi-Iayer perceptron. 1 INTRODUCTION A number of neural network studies have demonstrated the utility of the multi-layer perceptron (MLP) and shown it to be a highly effective paradigm. Studies have also shown, however, that the MLP is not without its problems, in particular it requires an extensive training time, is susceptible to local minima problems and its perfonnance is dependent upon its internal network architecture. In an attempt to improve upon the generalisation performance and computational efficiency a number of studies have been undertaken principally concerned with investigating the parametrisation of the MLP. It is well known, for example, that the generalisation performance of the MLP is affected by the number of hidden units in the network, which have to be determined empirically since theory provides no guidance.
Combined Neural Networks for Time Series Analysis
We propose a method for improving the performance of any network designed to predict the next value of a time series. Vve advocate analyzing the deviations of the network's predictions from the data in the training set. This can be carried out by a secondary network trained on the time series of these residuals. The combined system of the two networks is viewed as the new predictor. We demonstrate the simplicity and success of this method, by applying it to the sunspots data. The small corrections of the secondary network can be regarded as resulting from a Taylor expansion of a complex network which includes the combined system.
The Power of Amnesia
Ron, Dana, Singer, Yoram, Tishby, Naftali
We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more syntactic and semantic information is carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory models uniformly is not practical even for as short memory as four.
Training Neural Networks with Deficient Data
Tresp, Volker, Ahmad, Subutai, Neuneier, Ralph
We analyze how data with uncertain or missing input features can be incorporated into the training of a neural network. The general solution requires a weighted integration over the unknown or uncertain input although computationally cheaper closed-form solutions can be found for certain Gaussian Basis Function (GBF) networks. We also discuss cases in which heuristical solutions such as substituting the mean of an unknown input can be harmful.
Supervised learning from incomplete data via an EM approach
Ghahramani, Zoubin, Jordan, Michael I.
Real-world learning tasks may involve high-dimensional data sets with arbitrary patterns of missing data. In this paper we present a framework based on maximum likelihood density estimation for learning from such data set.s. VVe use mixture models for the density estimates and make two distinct appeals to the Expectation Maximization (EM) principle (Dempster et al., 1977) in deriving a learning algorithm-EM is used both for the estimation of mixture components and for coping wit.h missing dat.a. The resulting algorithm is applicable t.o a wide range of supervised as well as unsupervised learning problems.
Structural and Behavioral Evolution of Recurrent Networks
Saunders, Gregory M., Angeline, Peter J., Pollack, Jordan B.
This paper introduces GNARL, an evolutionary program which induces recurrent neural networks that are structurally unconstrained. In contrast to constructive and destructive algorithms, GNARL employs a population of networks and uses a fitness function's unsupervised feedback to guide search through network space. Annealing is used in generating both gaussian weight changes and structural modifications. Applying GNARL to a complex search and collection task demonstrates that the system is capable of inducing networks with complex internal dynamics.
Credit Assignment through Time: Alternatives to Backpropagation
Bengio, Yoshua, Frasconi, Paolo
Learning to recognize or predict sequences using long-term context has many applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior to that obtained with backpropagation. 1 Introduction Recurrent neural networks have been considered to learn to map input sequences to output sequences. Machines that could efficiently learn such tasks would be useful for many applications involving sequence prediction, recognition or production. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. In fact, we can prove that dynamical systems such as recurrent neural networks will be increasingly difficult to train with gradient descent as the duration of the dependencies to be captured increases. A mathematical analysis of the problem shows that either one of two conditions arises in such systems.