AITopics

In this setting, each pattern, represented as an n-dimensional feature vector, is associated with a discrete pattern class, or state of nature (Duda and Hart, 1973). Using available information, (e.g., a statistically representative set of labeled feature vectors

bayes risk, classification problem, classifier, (14 more...)

Country:

North America > United States > Vermont > Chittenden County > Burlington (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Texas (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Learning the Structure of Similarity

Tenenbaum, Joshua B.

The additive clustering (ADCL US) model (Shepard & Arabie, 1979) treats the similarity of two stimuli as a weighted additive measure of their common features. Inspired by recent work in unsupervised learning with multiple cause models, we propose anew, statistically well-motivated algorithm for discovering the structure of natural stimulus classes using the ADCLUS model, which promises substantial gains in conceptual simplicity, practical efficiency, and solution quality over earlier efforts.

algorithm, cluster configuration, similarity, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.30)

Platt, John C., Allen, Timothy P.

A Neural Network Classifier for the I100 OCR Chip

Therefore, we want c to be less than 0.5. In order to get a 2:1 margin, we choose c 0.25. The classifier is trained only on individual partial characters instead of all possible combinations of partial characters. Therefore, we can specify the classifier using only 1523 constraints, instead of creating a training set of approximately 128,000 possible combinations of partial characters. Applying these constraints is therefore much faster than back-propagation on the entire data set.

classifier, constraint, template, (14 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > California > Santa Clara County > San Jose (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.42)

A Practical Monte Carlo Implementation of Bayesian Learning

Rasmussen, Carl Edward

A practical method for Bayesian training of feed-forward neural networks using sophisticated Monte Carlo methods is presented and evaluated. In reasonably small amounts of computer time this approach outperforms other state-of-the-art methods on 5 datalimited tasks from real world domains.

carlo method, hyperparameter, monte carlo method, (12 more...)

Country:

North America > Canada > Ontario > Toronto (0.16)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Ormoneit, Dirk, Tresp, Volker

Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging

We compare two regularization methods which can be used to improve the generalization capabilities of Gaussian mixture density estimates. The first method uses a Bayesian prior on the parameter space. We derive EM (Expectation Maximization) update rules which maximize the a posterior parameter probability. In the second approach we apply ensemble averaging to density estimation. This includes Breiman's "bagging", which recently has been found to produce impressive results for classification networks.

improved gaussian mixture density estimate, parameter estimate, regularization, (9 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.05)
North America > United States > California (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Jaakkola, Tommi, Saul, Lawrence K., Jordan, Michael I.

Fast Learning by Bounding Likelihoods in Sigmoid Type Belief Networks

Sigmoid type belief networks, a class of probabilistic neural networks, provide a natural framework for compactly representing probabilistic information in a variety of unsupervised and supervised learning problems. Often the parameters used in these networks need to be learned from examples. Unfortunately, estimating the parameters via exact probabilistic calculations (i.e, the EMalgorithm) is intractable even for networks with fairly small numbers of hidden units. We propose to avoid the infeasibility of the E step by bounding likelihoods instead of computing them exactly. We introduce extended and complementary representations for these networks and show that the estimation of the network parameters can be made fast (reduced to quadratic optimization) by performing the estimation in either of the alternative domains.

bounding likelihood, probability, representation, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Asia > Middle East > Jordan (0.06)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Williams, Christopher K. I., Rasmussen, Carl Edward

Gaussian Processes for Regression

The Bayesian analysis of neural networks is difficult because a simple prior over weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparameters to be carried out exactly using matrix operations. Two methods, using optimization and averaging (via Hybrid Monte Carlo) over hyperparameters have been tested on a number of challenging problems and have produced excellent results. 1 INTRODUCTION In the Bayesian approach to neural networks a prior distribution over the weights induces a prior distribution over functions. This prior is combined with a noise model, which specifies the probability of observing the targets t given function values y, to yield a posterior over functions which can then be used for predictions. For neural networks the prior over functions has a complex form which means that implementations must either make approximations (e.g.

covariance function, gaussian process, hyperparameter, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Hofmann, Reimar, Tresp, Volker

Discovering Structure in Continuous Variables Using Bayesian Networks

We study Bayesian networks for continuous variables using nonlinear conditional density estimators. We demonstrate that useful structures can be extracted from a data set in a self-organized way and we present sampling techniques for belief update based on Markov blanket conditional density models.

bayesian network, conditional density model, density model, (12 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

A Unified Learning Scheme: Bayesian-Kullback Ying-Yang Machine

Xu, Lei

A Bayesian-Kullback learning scheme, called Ying-Yang Machine, is proposed based on the two complement but equivalent Bayesian representations for joint density and their Kullback divergence. Not only the scheme unifies existing major supervised and unsupervised learnings, including the classical maximum likelihood or least square learning, the maximum information preservation, the EM & em algorithm and information geometry, the recent popular Helmholtz machine, as well as other learning methods with new variants and new results; but also the scheme provides a number of new learning models. 1 INTRODUCTION Many different learning models have been developed in the literature. We may come to an age of searching a unified scheme for them. With a unified scheme, we may understand deeply the existing models and their relationships, which may cause cross-fertilization on them to obtain new results and variants; We may also be guided to develop new learning models, after we get better understanding on which cases we have already studied or missed, which deserve to be further explored. Recently, a Baysian-Kullback scheme, called the YING-YANG Machine, has been proposed as such an effort(Xu, 1995a). It bases on the Kullback divergence and two complement but equivalent Baysian representations for the joint distribution of the input space and the representation space, instead of merely using Kullback divergence for matching un-structuralized joint densities in information geometry type learnings (Amari, 1995a&b; Byrne, 1992; Csiszar, 1975).

pm2, representation, ylx, (13 more...)

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
Asia > Taiwan (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

Adaptive Mixture of Probabilistic Transducers

Singer, Yoram

We introduce and analyze a mixture model for supervised learning of probabilistic transducers. We devise an online learning algorithm that efficiently infers the structure and estimates the parameters of each model in the mixture. Theoretical analysis and comparative simulations indicate that the learning algorithm tracks the best model from an arbitrarily large (possibly infinite) pool of models. We also present an application of the model for inducing a noun phrase recognizer.

prediction, probability, suffix tree transducer, (14 more...)

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)