AITopics

Monotonicity is a constraint which arises in many application domains. Wepresent a machine learning model, the monotonic network, for which monotonicity can be enforced exactly, i.e., by virtue offunctional form. A straightforward method for implementing and training a monotonic network is described. Monotonic networks are proven to be universal approximators of continuous, differentiable monotonicfunctions. We apply monotonic networks to a real-world task in corporate bond rating prediction and compare them to other approaches. 1 Introduction Several recent papers in machine learning have emphasized the importance of priors anddomain-specific knowledge. In their well-known presentation of the biasvariance tradeoff(Geman and Bienenstock, 1992)' Geman and Bienenstock conclude by arguing that the crucial issue in learning is the determination of the "right biases" whichconstrain the model in the appropriate way given the task at hand.

artificial intelligence, machine learning, monotonic network, (14 more...)

Country: North America > United States > California (0.14)

Industry: Banking & Finance (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Learning Continuous Attractors in Recurrent Networks

Seung, H. Sebastian

One approach to invariant object recognition employs a recurrent neural networkas an associative memory. In the standard depiction of the network's state space, memories of objects are stored as attractive fixed points of the dynamics. I argue for a modification of this picture: if an object has a continuous family of instantiations, it should be represented by a continuous attractor. This idea is illustrated with a network that learns to complete patterns. To perform the task of filling in missing information, thenetwork develops a continuous attractor that models the manifold from which the patterns are drawn.

artificial intelligence, attractor, machine learning, (16 more...)

Country: North America > United States (0.15)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)

Schwenk, Holger, Bengio, Yoshua

Training Methods for Adaptive Boosting of Neural Networks

"Boosting" is a general method for improving the performance of any learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [5]. It has been applied withgreat success to several benchmark machine learning problems using rather simple learning algorithms [4], and decision trees [1, 2, 6]. In this paper we use AdaBoost to improve the performances of neural networks. We compare training methods based on sampling the training set and weighting the cost function. Our system achieves about 1.4% error on a data base of online handwritten digits from more than 200 writers. Adaptive boosting of a multi-layer network achieved 1.5% error on the UCI Letters and 8.1 % error on the UCI satellite data set.

artificial intelligence, classifier, machine learning, (18 more...)

Country: North America > Canada (0.15)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.87)

Schölkopf, Bernhard, Simard, Patrice, Smola, Alex J., Vapnik, Vladimir

Prior Knowledge in Support Vector Kernels

We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances undergroup transfonnations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions.

artificial intelligence, invariance, machine learning, (18 more...)

Country: North America > United States (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.63)

EM Algorithms for PCA and SPCA

Roweis, Sam T.

I present an expectation-maximization (EM) algorithm for principal component analysis (PCA). The algorithm allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data. It is computationally very efficient in space and time.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

RCC Cannot Compute Certain FSA, Even with Arbitrary Transfer Functions

Ring, Mark

The proof given here shows that for any finite, discrete transfer function used by the units of an RCC network, there are finite-state automata (FSA) that the network cannot model, no matter how many units are used. The proof also applies to continuous transfer functions with a finite number of fixed-points, such as sigmoid and radial-basis functions.

artificial intelligence, machine learning, transfer function, (17 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

An Incremental Nearest Neighbor Algorithm with Queries

Ratsaby, Joel

We consider the general problem of learning multi-category classification fromlabeled examples. We present experimental results for a nearest neighbor algorithm which actively selects samples from different pattern classes according to a querying rule instead of the a priori class probabilities. The amount of improvement of this query-based approach over the passive batch approach depends on the complexity of the Bayes rule. The principle on which this algorithm isbased is general enough to be used in any learning algorithm which permits a model-selection criterion and for which the error rate of the classifier is calculable in terms of the complexity of the model. 1 INTRODUCTION We consider the general problem of learning multi-category classification from labeled examples.In many practical learning settings the time or sample size available for training are limited. This may have adverse effects on the accuracy of the resulting classifier.For instance, in learning to recognize handwritten characters typical time limitation confines the training sample size to be of the order of a few hundred examples. It is important to make learning more efficient by obtaining only training data which contains significant information about the separability of the pattern classes thereby letting the learning algorithm participate actively in the sampling process. Querying for the class labels of specificly selected examples in the input space may lead to significant improvements in the generalization error (cf.

Mineiro, Paul, Movellan, Javier R., Williams, Ruth J.

Learning Path Distributions Using Nonequilibrium Diffusion Networks

Department of Mathematics University of California, San Diego La Jolla, CA 92093-0112 Abstract We propose diffusion networks, a type of recurrent neural network with probabilistic dynamics, as models for learning natural signals that are continuous in time and space. We give a formula for the gradient of the log-likelihood of a path with respect to the drift parameters for a diffusion network. This gradient can be used to optimize diffusion networks in the nonequilibrium regime for a wide variety of problems paralleling techniques which have succeeded in engineering fields such as system identification, state estimation and signal filtering. An aspect of this work which is of particular interestto computational neuroscience and hardware design is that with a suitable choice of activation function, e.g., quasi-linear sigmoidal, the gradient formula is local in space and time. 1 Introduction Many natural signals, like pixel gray-levels, line orientations, object position, velocity andshape parameters, are well described as continuous-time continuous-valued stochastic processes; however, the neural network literature has seldom explored the continuous stochastic case. Since the solutions to many decision theoretic problems of interest are naturally formulated using probability distributions, it is desirable to have a flexible framework for approximating probability distributions on continuous pathspaces.

artificial intelligence, diffusion network, machine learning, (17 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.25)
North America > United States > California > San Diego County > La Jolla (0.25)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)

Combining Classifiers Using Correspondence Analysis

Merz, Christopher J.

Next, C1 is used to form the indicator matrix, N. A correspondence analysis is performed on N to derive the scaled space, A

algorithm, artificial intelligence, machine learning, (12 more...)

Country: North America > United States > California > Orange County > Irvine (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.98)

Meila, Marina, Jordan, Michael I.

Estimating Dependency Structure as a Hidden Variable

This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Mateo County (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)