Genre
Green's Function Method for Fast On-Line Learning Algorithm of Recurrent Neural Networks
Sun, Guo-Zheng, Chen, Hsing-Hen, Lee, Yee-Chun
The two well known learning algorithms of recurrent neural networks are the back-propagation (Rumelhart & el al., Werbos) and the forward propagation (Williamsand Zipser). The main drawback of back-propagation is its off-line backward path in time for error cumulation. This violates the online requirement in many practical applications. Although the forward propagation algorithmcan be used in an online manner, the annoying drawback is the heavy computation load required to update the high dimensional sensitivity matrix(0(fir) operations for each time step). Therefore, to develop a fast forward algorithm is a challenging task.
Hierarchical Transformation of Space in the Visual System
Pouget, Alexandre, Fisher, Stephen A., Sejnowski, Terrence J.
Neurons encoding simple visual features in area VI such as orientation, direction of motion and color are organized in retinotopic maps. However, recentphysiological experiments have shown that the responses of many neurons in VI and other cortical areas are modulated by the direction ofgaze. We have developed a neural network model of the visual cortex to explore the hypothesis that visual features are encoded in headcentered coordinatesat early stages of visual processing. New experiments are suggested for testing this hypothesis using electrical stimulations and psychophysical observations.
Induction of Finite-State Automata Using Second-Order Recurrent Networks
Watrous, Raymond L., Kuhn, Gary M.
By a method of heuristic search over the space of finite state automata with up to eight states, he was able to induce a recognizer for each of these languages (Tomita, 1982). Recognizers of finite-state languages have also been induced using first-order recurrent connectionistnetworks (Elman, 1990; Williams and Zipser, 1988; Cleeremans, Servan-Schreiber and McClelland, 1989). Generally speaking, these results were obtained by training the network to predict the next symbol (Cleeremans, Servan-Schreiber and McClelland, 1989; Williams and Zipser, 1988), rather than by training the network to accept or reject strings of different .lengths. Several training algorithms used an approximation to the gradient (Elman, 1990; Cleeremans, Servan-Schreiberand McClelland, 1989) by truncating the computation of the backward recurrence. The problem of inducing languages from examples has also been approached using second-order recurrent networks (Pollack, 1990; Giles et al., 1990). Using a truncated approximationto the gradient, and Tomita's training sets, Pollack reported that "none of the ideal languages were induced" (Pollack, 1990). On the other hand, a Tomita language has been induced using the complete gradient (Giles et al., 1991). This paper reports the induction of several Tomita languages and the extraction of the corresponding automata with certain differences in method from (Giles et al., 1991).
Neural Network - Gaussian Mixture Hybrid for Speech Recognition or Density Estimation
Bengio, Yoshua, Mori, Renato De, Flammia, Giovanni, Kompe, Ralf
The subject of this paper is the integration of multi-layered Artificial Neural Networks(ANN) with probability density functions such as Gaussian mixtures found in continuous density Hidden Markov Models (HMM). In the first part of this paper we present an ANN/HMM hybrid in which all the parameters of the the system are simultaneously optimized with respect to a single criterion. In the second part of this paper, we study the relationship between the density of the inputs of the network and the density of the outputs of the networks. A few experiments are presented to explore how to perform density estimation with ANNs. 1 INTRODUCTION This paper studies the integration of Artificial Neural Networks (ANN) with probability densityfunctions (pdf) such as the Gaussian mixtures often used in continuous density Hidden Markov Models. The ANNs considered here are multi-layered or recurrent networks with hyperbolic tangent hidden units.
Practical Issues in Temporal Difference Learning
This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TO('\) algorithm, can be successfully appliedto complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. Thesepractical issues are then examined in the context of a case study in which TO('\) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. The hidden units in these network have apparently discovered useful features, a longstanding goal of computer games research.
Human and Machine 'Quick Modeling'
Bernasconi, Jakob, Gustafson, Karl
We present here an interesting experiment in'quick modeling' by humans, performed independently on small samples, in several languages and two continents, over the last three years. Comparisons to decision tree procedures andneural net processing are given. From these, we conjecture that human reasoning is better represented by the latter, but substantially different fromboth. Implications for the'strong convergence hypothesis' between neuralnetworks and machine learning are discussed, now expanded to include human reasoning comparisons. 1 INTRODUCTION Until recently the fields of symbolic and connectionist learning evolved separately. Suddenly in the last two years a significant number of papers comparing the two methodologies have appeared. A beginning synthesis of these two fields was forged at the NIPS '90 Workshop #5 last year (Pratt and Norton, 1990), where one may find a good bibliography of the recent work of Atlas, Dietterich, Omohundro, Sanger, Shavlik, Tsoi, Utgoff and others. It was at that NIPS '90 Workshop that we learned of these studies, most of which concentrate on performance comparisons of decision tree algorithms (such as ID3, CART) and neural net algorithms (such as Perceptrons, Backpropagation). Independently threeyears ago we had looked at Quinlan's ID3 scheme (Quinlan, 1984) and intuitively and rather instantly not agreeing with the generalization he obtains by ID3 from a sample of 8 items generalized to 12 items, we subjected this example to a variety of human experiments. We report our findings, as compared to the performance of ID3 and also to various neural net computations.
Improving the Performance of Radial Basis Function Networks by Learning Center Locations
Wettschereck, Dietrich, Dietterich, Thomas
Three methods for improving the performance of (gaussian) radial basis function (RBF) networks were tested on the NETtaik task. In RBF, a new example is classified by computing its Euclidean distance to a set of centers chosen by unsupervised methods. The application of supervised learning to learn a non-Euclidean distance metric was found to reduce the error rate of RBF networks, while supervised learning of each center's variance resultedin inferior performance. The best improvement in accuracy was achieved by networks called generalized radial basis function (GRBF) networks. In GRBF, the center locations are determined by supervised learning. After training on 1000 words, RBF classifies 56.5% of letters correct, while GRBF scores 73.4% letters correct (on a separate test set). From these and other experiments, we conclude that supervised learning of center locations can be very important for radial basis function learning.
Information Measure Based Skeletonisation
Ramachandran, Sowmya, Pratt, Lorien Y.
Automatic determination of proper neural network topology by trimming oversized networks is an important area of study, which has previously been addressed using a variety of techniques. In this paper, we present Information Measure Based Skeletonisation (IMBS), a new approach to this problem where superfluous hidden units are removed based on their information measure (1M). This measure, borrowed from decision tree induction techniques,reflects the degree to which the hyperplane formed by a hidden unit discriminates between training data classes. We show the results of applying IMBS to three classification tasks and demonstrate that it removes a substantial number of hidden units without significantly affecting network performance.