Technology
Monotonic Networks
Monotonicity is a constraint which arises in many application domains. Wepresent a machine learning model, the monotonic network, for which monotonicity can be enforced exactly, i.e., by virtue offunctional form. A straightforward method for implementing and training a monotonic network is described. Monotonic networks are proven to be universal approximators of continuous, differentiable monotonicfunctions. We apply monotonic networks to a real-world task in corporate bond rating prediction and compare them to other approaches. 1 Introduction Several recent papers in machine learning have emphasized the importance of priors anddomain-specific knowledge. In their well-known presentation of the biasvariance tradeoff(Geman and Bienenstock, 1992)' Geman and Bienenstock conclude by arguing that the crucial issue in learning is the determination of the "right biases" whichconstrain the model in the appropriate way given the task at hand.
Learning Continuous Attractors in Recurrent Networks
One approach to invariant object recognition employs a recurrent neural networkas an associative memory. In the standard depiction of the network's state space, memories of objects are stored as attractive fixed points of the dynamics. I argue for a modification of this picture: if an object has a continuous family of instantiations, it should be represented by a continuous attractor. This idea is illustrated with a network that learns to complete patterns. To perform the task of filling in missing information, thenetwork develops a continuous attractor that models the manifold from which the patterns are drawn.
Training Methods for Adaptive Boosting of Neural Networks
Schwenk, Holger, Bengio, Yoshua
"Boosting" is a general method for improving the performance of any learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [5]. It has been applied withgreat success to several benchmark machine learning problems using rather simple learning algorithms [4], and decision trees [1, 2, 6]. In this paper we use AdaBoost to improve the performances of neural networks. We compare training methods based on sampling the training set and weighting the cost function. Our system achieves about 1.4% error on a data base of online handwritten digits from more than 200 writers. Adaptive boosting of a multi-layer network achieved 1.5% error on the UCI Letters and 8.1 % error on the UCI satellite data set.
RCC Cannot Compute Certain FSA, Even with Arbitrary Transfer Functions
The proof given here shows that for any finite, discrete transfer function used by the units of an RCC network, there are finite-state automata (FSA) that the network cannot model, no matter how many units are used. The proof also applies to continuous transfer functions with a finite number of fixed-points, such as sigmoid and radial-basis functions.
An Incremental Nearest Neighbor Algorithm with Queries
We consider the general problem of learning multi-category classification fromlabeled examples. We present experimental results for a nearest neighbor algorithm which actively selects samples from different pattern classes according to a querying rule instead of the a priori class probabilities. The amount of improvement of this query-based approach over the passive batch approach depends on the complexity of the Bayes rule. The principle on which this algorithm isbased is general enough to be used in any learning algorithm which permits a model-selection criterion and for which the error rate of the classifier is calculable in terms of the complexity of the model. 1 INTRODUCTION We consider the general problem of learning multi-category classification from labeled examples.In many practical learning settings the time or sample size available for training are limited. This may have adverse effects on the accuracy of the resulting classifier.For instance, in learning to recognize handwritten characters typical time limitation confines the training sample size to be of the order of a few hundred examples. It is important to make learning more efficient by obtaining only training data which contains significant information about the separability of the pattern classes thereby letting the learning algorithm participate actively in the sampling process. Querying for the class labels of specificly selected examples in the input space may lead to significant improvements in the generalization error (cf.
Learning Path Distributions Using Nonequilibrium Diffusion Networks
Mineiro, Paul, Movellan, Javier R., Williams, Ruth J.
Department of Mathematics University of California, San Diego La Jolla, CA 92093-0112 Abstract We propose diffusion networks, a type of recurrent neural network with probabilistic dynamics, as models for learning natural signals that are continuous in time and space. We give a formula for the gradient of the log-likelihood of a path with respect to the drift parameters for a diffusion network. This gradient can be used to optimize diffusion networks in the nonequilibrium regime for a wide variety of problems paralleling techniques which have succeeded in engineering fields such as system identification, state estimation and signal filtering. An aspect of this work which is of particular interestto computational neuroscience and hardware design is that with a suitable choice of activation function, e.g., quasi-linear sigmoidal, the gradient formula is local in space and time. 1 Introduction Many natural signals, like pixel gray-levels, line orientations, object position, velocity andshape parameters, are well described as continuous-time continuous-valued stochastic processes; however, the neural network literature has seldom explored the continuous stochastic case. Since the solutions to many decision theoretic problems of interest are naturally formulated using probability distributions, it is desirable to have a flexible framework for approximating probability distributions on continuous pathspaces.
Estimating Dependency Structure as a Hidden Variable
Meila, Marina, Jordan, Michael I.
This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors.