AITopics

Many real world patterns have a hierarchical underlying structure in which simple features have a higher order structure among themselves. Because these relationships are often statistical in nature, it is natural to view the process of discovering such structures as a statistical inference problem in which a hierarchical model is fit to data. Hierarchical statistical structure can be conveniently represented with Bayesian belief networks (Pearl, 1988; Lauritzen and Spiegelhalter, 1988; Neal, 1992). These 530 M.S. Lewicki and T. 1. Sejnowski models are powerful, because they can capture complex statistical relationships among the data variables, and also mathematically convenient, because they allow efficient computation of the joint probability for any given set of model parameters. The joint probability density of a network of binary states is given by a product of conditional probabilities (1) where W is the weight matrix that parameterizes the model. Note that the probability ofan individual state Si depends only on its parents.

artificial intelligence, machine learning, representation, (15 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes

Pareigis, Stephan

The optimal control problem reduces to a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton Jacobi-Bellman (HJB-) type. Numerical analysis provides multigrid methodsfor this kind of equation. In the case of Learning Control, however,the systems of equations on the various grid-levels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed toward thetype of time and space discretization during the observation. Analgorithm for multi-grid observation is proposed.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: Europe > Germany (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Bonet, Jeremy S. De, Jr., Charles Lee Isbell, Viola, Paul A.

MIMIC: Finding Optima by Estimating Probability Densities

In many optimization problems, the structure of solutions reflects complex relationships between the different input parameters. For example, experience may tell us that certain parameters are closely related and should not be explored independently. Similarly, experience mayestablish that a subset of parameters must take on particular values. Any search of the cost landscape should take advantage of these relationships. We present MIMIC, a framework in which we analyze the global structure of the optimization landscape. Anovel and efficient algorithm for the estimation of this structure is derived. We use knowledge of this structure to guide a randomized search through the solution space and, in turn, to refine ourestimate ofthe structure.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States > Massachusetts (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ordered Classes and Incomplete Examples in Classification

Mathieson, Mark

The classes in classification tasks often have a natural ordering, and the training and testing examples are often incomplete. We propose a nonlinear ordinalmodel for classification into ordered classes. Predictive, simulation-based approaches are used to learn from past and classify future incompleteexamples. These techniques are illustrated by making prognoses for patients who have suffered severe head injuries.

artificial intelligence, imputation, machine learning, (16 more...)

Country: Europe > United Kingdom > England (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Combinations of Weak Classifiers

Ji, Chuanyi, Ma, Sheng

To obtain classification systems with both good generalization performance andefficiency in space and time, we propose a learning method based on combinations of weak classifiers, where weak classifiers arelinear classifiers (perceptrons) which can do a little better than making random guesses. A randomized algorithm is proposed to find the weak classifiers. They· are then combined through a majority vote.As demonstrated through systematic experiments, the method developed is able to obtain combinations of weak classifiers with good generalization performance and a fast training time on a variety of test problems and real applications.

artificial intelligence, classifier, machine learning, (16 more...)

Country: North America > United States > California (0.14)

Genre: Research Report (0.47)

Industry: Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.32)

Dunmur, A. P., Titterington, D. M.

On a Modification to the Mean Field EM Algorithm in Factorial Learning

A modification is described to the use of mean field approximations inthe E step of EM algorithms for analysing data from latent structure models, as described by Ghahramani (1995), among others. Themodification involves second-order Taylor approximations to expectations computed in the E step. The potential benefits of the method are illustrated using very simple latent profile models. 1 Introduction Ghahramani (1995) advocated the use of mean field methods as a means to avoid the heavy computation involved in the E step of the EM algorithm used for estimating parameters within a certain latent structure model, and Ghahramani & Jordan (1995) used the same ideas in a more complex situation. Dunmur & Titterington (1996a) identified Ghahramani's model as a so-called latent profile model, they observed that Zhang (1992,1993) had used mean field methods for a similar purpose, and they showed, in a simulation study based on very simple examples, that the mean field version of the EM algorithm often performed very respectably. By this it is meant that, when data were generated from the model under analysis, the estimators of the underlying parameters were efficient, judging by empirical results, especially in comparison with estimators obtained by employing the'correct' EM algorithm: the examples therefore had to be simple enough that the correct EM algorithm is numerically feasible, although any success reported for the mean field 432 A. P. Dunmur and D. M. Titterington version is, one hopes, an indication that the method will also be adequate in more complex situations in which the correct EM algorithm is not implementable because of computational complexity. In spite of the above positive remarks, there were circumstances in which there was a perceptible, if not dramatic, lack of efficiency in the simple (naive) mean field estimators, and the objective of this contribution is to propose and investigate ways of refining the method so as to improve performance without detracting from the appealing, and frequently essential, simplicity of the approach. The procedure used here is based on a second order correction to the naive mean field well known in statistical physics and sometimes called the cavity or TAP method (Mezard, Parisi & Virasoro, 1987). It has been applied recently in cluster analysis (Hofmann & Buhmann, 1996). In Section 2 we introduce the structure of our model, Section 3 explains the refined mean field approach, Section 4 provides numerical results, and Section 5 contains a statement of our conclusions.

artificial intelligence, latent variable, machine learning, (16 more...)

Country:

North America > United States (0.28)
Asia > Middle East > Jordan (0.25)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Burges, Christopher J. C., Schölkopf, Bernhard

Improving the Accuracy and Speed of Support Vector Machines

Support Vector Learning Machines (SVM) are finding application in pattern recognition, regression estimation, and operator inversion forill-posed problems. Against this very general backdrop, any methods for improving the generalization performance, or for improving the speed in test phase, of SVMs are of increasing interest. Inthis paper we combine two such techniques on a pattern recognition problem. The method for improving generalization performance (the"virtual support vector" method) does so by incorporating known invariances of the problem. This method achieves a drop in the error rate on 10,000 NIST test digit images of 1.4% to 1.0%.

artificial intelligence, machine learning, vector, (16 more...)

Country:

Europe (0.28)
North America > United States > California > San Mateo County (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Singh, Satinder P., Dayan, Peter

Analytical Mean Squared Error Curves in Temporal Difference Learning

We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal difference value estimation algorithms change with offline updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its stepsize andeligibility trace parameters. 1 INTRODUCTION A reassuring theory of asymptotic convergence is available for many reinforcement learning (RL) algorithms. What is not available, however, is a theory that explains the finite-term learning curve behavior of RL algorithms, e.g., what are the different kinds of learning curves, what are their key determinants, and how do different problem parameters effect rate of convergence. Answering these questions is crucial not only for making useful comparisons between algorithms, but also for developing hybrid and new RL methods. In this paper we provide preliminary answers to some of the above questions for the case of absorbing Markov chains, where mean square error between the estimated and true predictions is used as the quantity of interest in learning curves.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Fritsch, Jürgen, Finke, Michael, Waibel, Alex

Adaptively Growing Hierarchical Mixtures of Experts

We propose a novel approach to automatically growing and pruning Hierarchical Mixtures of Experts. The constructive algorithm proposed hereenables large hierarchies consisting of several hundred experts to be trained effectively. We show that HME's trained by our automatic growing procedure yield better generalization performance thantraditional static and balanced hierarchies. Evaluation of the algorithm is performed (1) on vowel classification and (2) within a hybrid version of the JANUS r9] speech recognition systemusing a subset of the Switchboard large-vocabulary speaker-independent continuous speech recognition database.

artificial intelligence, hme, machine learning, (19 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)

Padgett, Curtis, Cottrell, Garrison W.

Representing Face Images for Emotion Classification

Curtis Padgett Department of Computer Science University of California, San Diego La Jolla, CA 92034 GarrisonCottrell Department of Computer Science University of California, San Diego La Jolla, CA 92034 Abstract We compare the generalization performance of three distinct representation schemesfor facial emotions using a single classification strategy (neural network). The face images presented to the classifiers arerepresented as: full face projections of the dataset onto their eigenvectors (eigenfaces); a similar projection constrained to eye and mouth areas (eigenfeatures); and finally a projection of the eye and mouth areas onto the eigenvectors obtained from 32x32 random image patches from the dataset. The latter system achieves 86% generalization on novel face images (individuals the networks were not trained on) drawn from a database in which human subjects consistentlyidentify a single emotion for the face . 1 Introduction Some of the most successful research in machine perception of complex natural image objects (like faces), has relied heavily on reduction strategies that encode an object as a set of values that span the principal component subspace of the object's images [Cottrell and Metcalfe, 1991, Pentland et al., 1994]. This approach has gained wide acceptance for its success in classification, for the efficiency in which the eigenvectors can be calculated, and because the technique permits an implementation thatis biologically plausible. The procedure followed in generating these face representations requires normalizing a large set of face views (" mugshots") and from these, identifying a statistically relevant subspace.

artificial intelligence, machine learning, representation, (14 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.45)
North America > United States > California > San Diego County > La Jolla (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)