AITopics

We propose a novel approach to automatically growing and pruning Hierarchical Mixtures of Experts. The constructive algorithm proposed here enables large hierarchies consisting of several hundred experts to be trained effectively. We show that HME's trained by our automatic growing procedure yield better generalization performance than traditional static and balanced hierarchies. Evaluation of the algorithm is performed (1) on vowel classification and (2) within a hybrid version of the JANUS r9] speech recognition system using a subset of the Switchboard large-vocabulary speaker-independent continuous speech recognition database.

hierarchical mixture, hme, probability, (17 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)

Continuous Sigmoidal Belief Networks Trained using Slice Sampling

Frey, Brendan J.

These include Boltzmann machines (Hinton and Sejnowski 1986), binary sigmoidal belief networks (Neal 1992) and Helmholtz machines (Hinton et al. 1995; Dayan et al. 1995). However, some hidden variables, such as translation or scaling in images of shapes, are best represented using continuous values. Continuous-valued Boltzmann machines have been developed (Movellan and McClelland 1993), but these suffer from long simulation settling times and the requirement of a "negative phase" during learning. Tibshirani (1992) and Bishop et al. (1996) consider learning mappings from a continuous latent variable space to a higher-dimensional input space. MacKay (1995) has developed "density networks" that can model both continuous and categorical latent spaces using stochasticity at the topmost network layer. In this paper I consider a new hierarchical top-down connectionist model that has stochastic hidden variables at all layers; moreover, these variables can adapt to be continuous or categorical. The proposed top-down model can be viewed as a continuous-valued belief network, which can be simulated by performing a quick top-down pass (Pearl 1988).

belief network, continuous sigmoidal belief network, postsigmoid activity, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Dunmur, A. P., Titterington, D. M.

On a Modification to the Mean Field EM Algorithm in Factorial Learning

A modification is described to the use of mean field approximations in the E step of EM algorithms for analysing data from latent structure models, as described by Ghahramani (1995), among others. The modification involves second-order Taylor approximations to expectations computed in the E step. The potential benefits of the method are illustrated using very simple latent profile models.

algorithm, approximation, latent variable, (14 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Bonet, Jeremy S. De, Jr., Charles Lee Isbell, Viola, Paul A.

MIMIC: Finding Optima by Estimating Probability Densities

In many optimization problems, the structure of solutions reflects complex relationships between the different input parameters. For example, experience may tell us that certain parameters are closely related and should not be explored independently. Similarly, experience may establish that a subset of parameters must take on particular values. Any search of the cost landscape should take advantage of these relationships. We present MIMIC, a framework in which we analyze the global structure of the optimization landscape. A novel and efficient algorithm for the estimation of this structure is derived. We use knowledge of this structure to guide a randomized search through the solution space and, in turn, to refine our estimate ofthe structure.

algorithm, cost function, mimic, (14 more...)

Country:

Asia > Middle East > Jordan (0.06)
North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Minimizing Statistical Bias with Queries

Cohn, David A.

I describe a querying criterion that attempts to minimize the error of a learner by minimizing its estimated squared bias. I describe experiments with locally-weighted regression on two simple problems, and observe that this "bias-only" approach outperforms the more common "variance-only" exploration approach, even in the presence of noise.

criterion, experiment, learner, (16 more...)

Country:

North America > United States > New York (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)

Burges, Christopher J. C., Schölkopf, Bernhard

Improving the Accuracy and Speed of Support Vector Machines

Support Vector Learning Machines (SVM) are finding application in pattern recognition, regression estimation, and operator inversion for ill-posed problems. Against this very general backdrop, any methods for improving the generalization performance, or for improving the speed in test phase, of SVMs are of increasing interest. In this paper we combine two such techniques on a pattern recognition problem. The method for improving generalization performance (the "virtual support vector" method) does so by incorporating known invariances of the problem. This method achieves a drop in the error rate on 10,000 NIST test digit images of 1.4% to 1.0%.

support vector, vapnik, vector, (14 more...)

Country:

North America > United States > New York (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Bishop, Christopher M., Quazaz, Cazhaow S.

Regression with Input-Dependent Noise: A Bayesian Treatment

In most treatments of the regression problem it is assumed that the distribution of target data can be described by a deterministic function of the inputs, together with additive Gaussian noise having constant variance. The use of maximum likelihood to train such models then corresponds to the minimization of a sum-of-squares error function. In many applications a more realistic model would allow the noise variance itself to depend on the input variables. However, the use of maximum likelihood to train such models would give highly biased results. In this paper we show how a Bayesian treatment can allow for an input-dependent variance while overcoming the bias of maximum likelihood.

input-dependent noise, noise variance, variance, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Consistent Classification, Firm and Soft

Baram, Yoram

A classifier is called consistent with respect to a given set of classlabeled points if it correctly classifies the set. We consider classifiers defined by unions of local separators and propose algorithms for consistent classifier reduction. The expected complexities of the proposed algorithms are derived along with the expected classifier sizes. In particular, the proposed approach yields a consistent reduction of the nearest neighbor classifier, which performs "firm" classification, assigning each new object to a class, regardless of the data structure. The proposed reduction method suggests a notion of "soft" classification, allowing for indecision with respect to objects which are insufficiently or ambiguously supported by the data. The performances of the proposed classifiers in predicting stock behavior are compared to that achieved by the nearest neighbor method.

classification, classifier, separator, (14 more...)

Country:

North America > United States (0.14)
Asia > Middle East > Israel > Haifa District > Haifa (0.05)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)

Industry: Banking & Finance > Trading (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Zeevi, Assaf J., Meir, Ron, Adler, Robert J.

Time Series Prediction using Mixtures of Experts

We consider the problem of prediction of stationary time series, using the architecture known as mixtures of experts (MEM). Here we suggest a mixture which blends several autoregressive models. This study focuses on some theoretical foundations of the prediction problem in this context. More precisely, it is demonstrated that this model is a universal approximator, with respect to learning the unknown prediction function. This statement is strengthened as upper bounds on the mean squared error are established. Based on these results it is possible to compare the MEM to other families of models (e.g., neural networks and state dependent models). It is shown that a degenerate version of the MEM is in fact equivalent to a neural network, and the number of experts in the architecture plays a similar role to the number of hidden units in the latter model.

mem, neural network, predictor function, (12 more...)

Country:

North America > United States > North Carolina > Orange County > Chapel Hill (0.14)
North America > United States > New York (0.05)
Asia > Middle East > Jordan (0.05)
(3 more...)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Microscopic Equations in Rough Energy Landscape for Neural Networks

Wong, K. Y. Michael

We consider the microscopic equations for learning problems in neural networks. The aligning fields of an example are obtained from the cavity fields, which are the fields if that example were absent in the learning process. In a rough energy landscape, we assume that the density of the local minima obey an exponential distribution, yielding macroscopic properties agreeing with the first step replica symmetry breaking solution. Iterating the microscopic equations provide a learning algorithm, which results in a higher stability than conventional algorithms. 1 INTRODUCTION Most neural networks learn iteratively by gradient descent. As a result, closed expressions for the final network state after learning are rarely known. This precludes further analysis of their properties, and insights into the design of learning algorithms.

energy landscape, landscape, microscopic equation, (10 more...)

Country:

Asia > Singapore (0.05)
North America > United States > New York (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.49)