Technology
Information, Prediction, and Query by Committee
Freund, Yoav, Seung, H. Sebastian, Shamir, Eli, Tishby, Naftali
We analyze the "query by committee" algorithm, a method for filtering informativequeries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gainwith positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of thresholded smooth functions.
Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison
A performance comparison of two self-organizing networks, the Kohonen FeatureMap and the recently proposed Growing Cell Structures is made. For this purpose several performance criteria for self-organizing networks are proposed and motivated. The models are tested with three example problems of increasing difficulty. The Kohonen Feature Map demonstrates slightly superior results only for the simplest problem.
Second order derivatives for network pruning: Optimal Brain Surgeon
Hassibi, Babak, Stork, David G.
We investigate the use of information from all second order derivatives of the error function to perfonn network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and in some cases enable rule extraction. Our method, Optimal Brain Surgeon (OBS), is Significantly better than magnitude-based methods and Optimal Brain Damage [Le Cun, Denker and Sol1a, 1990], which often remove the wrong weights. OBS permits the pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix HI from training data and structural information of the net. OBS permits a 90%, a 76%, and a 62% reduction in weights over backpropagation with weighL decay on three benchmark MONK's problems [Thrun et aI., 1991]. Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg [1987J used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1560 weights, yielding better generalization.
Reinforcement Learning Applied to Linear Quadratic Regulation
Recent research on reinforcement learning has focused on algorithms basedon the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms isthe control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con vergence proofsfor problems with continuous state and action spaces, or for systems involving nonlinear function approximators (such as multilayer perceptrons). This paper presents research applying DPbased reinforcement learning theory to Linear Quadratic Regulation (LQR),an important class of control problems involving continuous state and action spaces and requiring a simple type of nonlinear function approximator. We describe an algorithm based on Q-Iearning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly different algorithm that is only locally convergent to the optimal Q-function, demonstrating one of the possible pitfalls of using a nonlinear function approximator with DPbased learning.
Combining Neural and Symbolic Learning to Revise Probabilistic Rule Bases
Mahoney, J. Jeffrey, Mooney, Raymond J.
Recently, both connectionist and symbolic methods have been developed for biasing learning with prior knowledge lFu,1989; Towell et a/., 1990; Ourston and Mooney, 1990]. Most ofthese methods revise an imperfect knowledge base (usually obtained from a domain expert) to fit a set of empirical data. Some of these methods have been successfully applied to real-world tasks, such as recognizing promoter sequences in DNA [Towell et ai., 1990; Ourston and Mooney, 1990]. The results demonstrate that revising an expert-given knowledge base produces more accurate results than learning from training data alone. Inthis paper, we describe the RAPTURE system (Revising Approximate 107 108 Mahoney and Mooney Probabilistic Theories Using Repositories of Examples), which combines connectionist andsymbolic methods to revise both the parameters and structure of a certainty-factor rule base. 2 The Rapture Algorithm
Planar Hidden Markov Modeling: From Speech to Optical Character Recognition
Levin, Esther, Pieraccini, Roberto
We propose in this paper a statistical model (planar hidden Markov model - PHMM) describing statistical properties of images. The model generalizes the single-dimensional HMM, used for speech processing, to the planar case. For this model to be useful an efficient segmentation algorithm, similar to the Viterbi algorithm for HMM, must exist We present conditions in terms of the PHMM parameters that are sufficient to guarantee that the planar segmentation problem can be solved in polynomial time, and describe an algorithm for that. This algorithm aligns optimally the image with the model, and therefore is insensitive to elastic distortions of images. Using this algorithm a joint optima1 segmentation and recognition of the image can be performed, thus overcoming the weakness of traditional OCR systems where segmentation is performed independently before the recognition leading to unrecoverable recognition errors. Tbe PHMM approach was evaluated using a set of isolated band-written digits. An overall digit recognition accuracy of 95% was acbieved. An analysis of the results showed that even in the simple case of recognition of isolated characters, the elimination of elastic distortions enhances the performance Significantly. We expect that the advantage of this approach will be even more significant for tasks such as connected writing recognition/spotting, for whicb there is no known high accuracy method of recognition.
Interposing an ontogenetic model between Genetic Algorithms and Neural Networks
The relationships between learning, development and evolution in Nature is taken seriously, to suggest a model of the developmental process whereby the genotypes manipulated by the Genetic Algorithm (GA)might be expressed to form phenotypic neural networks (NNet) that then go on to learn. ONTOL is a grammar for generating polynomialNNets for time-series prediction. Genomes correspond toan ordered sequence of ONTOL productions and define a grammar that is expressed to generate a NNet. The NNet's weights are then modified by learning, and the individual's prediction error is used to determine GA fitness. A new gene doubling operator appears critical to the formation of new genetic alternatives in the preliminary but encouraging results presented.
Q-Learning with Hidden-Unit Restarting
Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restarting, unitscontinue to learn via back-propagation. The resulting restart algorithm is tested in a Q-Iearning network that learns to solve an inverted pendulum problem. Solutions are found faster on average with the restart algorithm than without it.
Nets with Unreliable Hidden Nodes Learn Error-Correcting Codes
In a multi-layered neural network, anyone of the hidden layers can be viewed as computing a distributed representation of the input. Several "encoder" experiments have shown that when the representation space is small it can be fully used. But computing with such a representation requires completely dependable nodes. In the case where the hidden nodes are noisy and unreliable, we find that error correcting schemes emerge simply by using noisy units during training; random errors injected duringbackpropagation result in spreading representations apart. Average and minimum distances increase with misfire probability, as predicted by coding-theoretic considerations. Furthennore, the effect of this noise is to protect the machine against permanent node failure, thereby potentially extending the useful lifetime of the machine.