AITopics

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Sommer, Friedrich T., Palm, Günther

Bidirectional Retrieval from Associative Memory

Similarity based fault tolerant retrieval in neural associative memories (NAM) has not lead to wiedespread applications. A drawback of the efficient Willshaw model for sparse patterns [Ste61, WBLH69], is that the high asymptotic information capacity is of little practical use because of high cross talk noise arising in the retrieval for finite sizes. Here a new bidirectional iterative retrieval method for the Willshaw model is presented, called crosswise bidirectional (CB)retrieval, providing enhanced performance. We discuss its asymptotic capacity limit, analyze the first step, and compare itin experiments with the Willshaw model. Applying the very efficient CB memory model either in information retrieval systems or as a functional model for reciprocal cortico-cortical pathways requires more than robustness against random noise in the input: Our experiments show also the segmentation ability of CB-retrieval with addresses containing the superposition of pattens, provided even at high memory load. 1 INTRODUCTION From a technical point of view neural associative memories (NAM) provide data storage and retrieval.

information retrieval, machine learning, natural language, (18 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.84)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.84)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Smyth, Padhraic, Wolpert, David

Stacked Density Estimation

The component gj's are usually relatively simple unimodal densities such as Gaussians. Density estimation with mixtures involves finding the locations, shapes, and weights of the component densities from the data (using for example the Expectation-Maximization (EM) procedure). Kernel density estimation canbe viewed as a special case of mixture modeling where a component is centered at each data point, given a weight of 1/N, and a common covariance structure (kernel shape) is estimated from the data. The quality of a particular probabilistic model can be evaluated by an appropriate scoring rule on independent out-of-sample data, such as the test set log-likelihood (also referred to as the log-scoring rule in the Bayesian literature).

artificial intelligence, machine learning, mixture model, (16 more...)

Country: North America > United States > California > Orange County > Irvine (0.14)

Industry:

Government > Space Agency (0.47)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Monotonic Networks

Sill, Joseph

Monotonicity is a constraint which arises in many application domains. Wepresent a machine learning model, the monotonic network, for which monotonicity can be enforced exactly, i.e., by virtue offunctional form. A straightforward method for implementing and training a monotonic network is described. Monotonic networks are proven to be universal approximators of continuous, differentiable monotonicfunctions. We apply monotonic networks to a real-world task in corporate bond rating prediction and compare them to other approaches. 1 Introduction Several recent papers in machine learning have emphasized the importance of priors anddomain-specific knowledge. In their well-known presentation of the biasvariance tradeoff(Geman and Bienenstock, 1992)' Geman and Bienenstock conclude by arguing that the crucial issue in learning is the determination of the "right biases" whichconstrain the model in the appropriate way given the task at hand.

artificial intelligence, machine learning, monotonic network, (14 more...)

Country: North America > United States > California (0.14)

Industry: Banking & Finance (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Learning Continuous Attractors in Recurrent Networks

Seung, H. Sebastian

One approach to invariant object recognition employs a recurrent neural networkas an associative memory. In the standard depiction of the network's state space, memories of objects are stored as attractive fixed points of the dynamics. I argue for a modification of this picture: if an object has a continuous family of instantiations, it should be represented by a continuous attractor. This idea is illustrated with a network that learns to complete patterns. To perform the task of filling in missing information, thenetwork develops a continuous attractor that models the manifold from which the patterns are drawn.

artificial intelligence, attractor, machine learning, (16 more...)

Country: North America > United States (0.15)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)

Schwenk, Holger, Bengio, Yoshua

Training Methods for Adaptive Boosting of Neural Networks

"Boosting" is a general method for improving the performance of any learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [5]. It has been applied withgreat success to several benchmark machine learning problems using rather simple learning algorithms [4], and decision trees [1, 2, 6]. In this paper we use AdaBoost to improve the performances of neural networks. We compare training methods based on sampling the training set and weighting the cost function. Our system achieves about 1.4% error on a data base of online handwritten digits from more than 200 writers. Adaptive boosting of a multi-layer network achieved 1.5% error on the UCI Letters and 8.1 % error on the UCI satellite data set.

artificial intelligence, classifier, machine learning, (18 more...)

Country: North America > Canada (0.15)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.87)

Schölkopf, Bernhard, Simard, Patrice, Smola, Alex J., Vapnik, Vladimir

Prior Knowledge in Support Vector Kernels

We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances undergroup transfonnations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions.

artificial intelligence, invariance, machine learning, (18 more...)

Country: North America > United States (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.63)

EM Algorithms for PCA and SPCA

Roweis, Sam T.

I present an expectation-maximization (EM) algorithm for principal component analysis (PCA). The algorithm allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data. It is computationally very efficient in space and time.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

RCC Cannot Compute Certain FSA, Even with Arbitrary Transfer Functions

Ring, Mark

The proof given here shows that for any finite, discrete transfer function used by the units of an RCC network, there are finite-state automata (FSA) that the network cannot model, no matter how many units are used. The proof also applies to continuous transfer functions with a finite number of fixed-points, such as sigmoid and radial-basis functions.

artificial intelligence, machine learning, transfer function, (17 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

An Incremental Nearest Neighbor Algorithm with Queries

Ratsaby, Joel

We consider the general problem of learning multi-category classification fromlabeled examples. We present experimental results for a nearest neighbor algorithm which actively selects samples from different pattern classes according to a querying rule instead of the a priori class probabilities. The amount of improvement of this query-based approach over the passive batch approach depends on the complexity of the Bayes rule. The principle on which this algorithm isbased is general enough to be used in any learning algorithm which permits a model-selection criterion and for which the error rate of the classifier is calculable in terms of the complexity of the model. 1 INTRODUCTION We consider the general problem of learning multi-category classification from labeled examples.In many practical learning settings the time or sample size available for training are limited. This may have adverse effects on the accuracy of the resulting classifier.For instance, in learning to recognize handwritten characters typical time limitation confines the training sample size to be of the order of a few hundred examples. It is important to make learning more efficient by obtaining only training data which contains significant information about the separability of the pattern classes thereby letting the learning algorithm participate actively in the sampling process. Querying for the class labels of specificly selected examples in the input space may lead to significant improvements in the generalization error (cf.