Machine Learning
Learning Curves, Model Selection and Complexity of Neural Networks
Murata, Noboru, Yoshizawa, Shuji, Amari, Shun-ichi
Learning curves show how a neural network is improved as the number of t.raiuing examples increases and how it is related to the network complexity. The present paper clarifies asymptotic properties and their relation of t.wo learning curves, one concerning the predictive loss or generalization loss and the other the training loss. The result gives a natural definition of the complexity of a neural network. Moreover, it provides a new criterion of model selection.
Analog VLSI Implementation of Multi-dimensional Gradient Descent
Kirk, David B., Kerns, Douglas, Fleischer, Kurt, Barr, Alan H.
The implementation uses noise injection and multiplicative correlation to estimate derivatives, as in [Anderson, Kerns 92]. One intended application of this technique is setting circuit parameters on-chip automatically, rather than manually [Kirk 91]. Gradient descent optimization may be used to adjust synapse weights for a backpropagation or other on-chip learning implementation. The approach combines the features of continuous multidimensional gradient descent and the potential for an annealing style of optimization. We present data measured from our analog VLSI implementation. 1 Introduction This work is similar to [Anderson, Kerns 92], but represents two advances. First, we describe the extension of the technique to multiple dimensions. Second, we demonstrate an implementation of the multidimensional technique in analog VLSI, and provide results measured from the chip. Unlike previous work using noise sources in adaptive systems, we use the noise as a means of estimating the gradient of a function f(y), rather than performing an annealing process [Alspector 88]. We also estimate gr-;:dients continuously in position and time, in contrast to [Umminger 89] and [J abri 91], which utilize discrete position gradient estimates.
Learning Sequential Tasks by Incrementally Adding Higher Orders
An incremental, higher-order, non-recurrent network combines two properties found to be useful for learning sequential tasks: higherorder connections and incremental introduction of new units. The network adds higher orders when needed by adding new units that dynamically modify connection weights. Since the new units modify the weights at the next time-step with information from the previous step, temporal tasks can be learned without the use of feedback, thereby greatly simplifying training. Furthermore, a theoretically unlimited number of units can be added to reach into the arbitrarily distant past. Experiments with the Reber grammar have demonstrated speedups of two orders of magnitude over recurrent networks.
Hidden Markov Models in Molecular Biology: New Algorithms and Applications
Baldi, Pierre, Chauvin, Yves, Hunkapiller, Tim, McClure, Marcella A.
Hidden Markov Models (HMMs) can be applied to several important problems in molecular biology. We introduce a new convergent learning algorithm for HMMs that, unlike the classical Baum-Welch algorithm is smooth and can be applied online or in batch mode, with or without the usual Viterbi most likely path approximation. Left-right HMMs with insertion and deletion states are then trained to represent several protein families including immunoglobulins and kinases. In all cases, the models derived capture all the important statistical properties of the families and can be used efficiently in a number of important tasks such as multiple alignment, motif detection, and classification.
A Hybrid Linear/Nonlinear Approach to Channel Equalization Problems
Channel equalization problem is an important problem in high-speed communications. The sequences of symbols transmitted are distorted by neighboring symbols. Traditionally, the channel equalization problem is considered as a channel-inversion operation. One problem of this approach is that there is no direct correspondence between error probability and residual error produced by the channel inversion operation. In this paper, the optimal equalizer design is formulated as a classification problem. The optimal classifier can be constructed by Bayes decision rule. In general it is nonlinear. An efficient hybrid linear/nonlinear equalizer approach has been proposed to train the equalizer. The error probability of new linear/nonlinear equalizer has been shown to be better than a linear equalizer in an experimental channel. 1 INTRODUCTION
Learning to categorize objects using temporal coherence
The invariance of an objects' identity as it transformed over time provides a powerful cue for perceptual learning. We present an unsupervised learning procedure which maximizes the mutual information between the representations adopted by a feed-forward network at consecutive time steps. We demonstrate that the network can learn, entirely unsupervised, to classify an ensemble of several patterns by observing pattern trajectories, even though there are abrupt transitions from one object to another between trajectories. The same learning procedure should be widely applicable to a variety of perceptual learning tasks. 1 INTRODUCTION A promising approach to understanding human perception is to try to model its developmental stages. There is ample evidence that much of perception is learned.
Harmonic Grammars for Formal Languages
Basic connectionist principles imply that grammars should take the form of systems of parallel soft constraints defining an optimization problem the solutions to which are the well-formed structures in the language. Such Harmonic Grammars have been successfully applied to a number of problems in the theory of natural languages. Here it is shown that formal languages too can be specified by Harmonic Grammars, rather than by conventional serial rewrite rule systems. 1 HARMONIC GRAMMARS In collaboration with Geraldine Legendre, Yoshiro Miyata, and Alan Prince, I have been studying how symbolic computation in human cognition can arise naturally as a higher-level virtual machine realized in appropriately designed lower-level connectionist networks. The basic computational principles of the approach are these: (1) a. \Vhell analyzed at the lower level, mental representations are distributed patterns of connectionist activity; when analyzed at a higher level, these same representations constitute symbolic structures.