AITopics

An adaptive online algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. Its efficiency is demonstrated for a non-stationary blind separation task of acoustic signals.

algorithm, blind separation, loss function, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Germany > Berlin (0.04)
(2 more...)

Genre: Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.53)

Munro, Paul W., Parmanto, Bambang

Competition Among Networks Improves Committee Performance

Since a neural network predictor inherently has an excessive number of parameters, reducing the prediction error is usually done by reducing variance. Methods for reducing neural network complexity can be viewed as a regularization technique to reduce this variance. Examples of such methods are Optimal Brain Damage (Le Cun et.

correlation, secondary unit, training signal, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Miller, David J., Uyar, Hasan S.

A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data

We address statistical classifier design given a mixed training set consisting of a small labelled feature set and a (generally larger) set of unlabelled features. This situation arises, e.g., for medical images, where although training features may be plentiful, expensive expertise is required to extract their class labels. We propose a classifier structure and learning algorithm that make effective use of unlabelled data to improve performance. The learning is based on maximization of the total data likelihood, i.e. over both the labelled and unlabelled data subsets. Two distinct EM learning algorithms are proposed, differing in the EM formalism applied for unlabelled data. The classifier, based on a joint probability model for features and labels, is a "mixture of experts" structure that is equivalent to the radial basis function (RBF) classifier, but unlike RBFs, is amenable to likelihood-based training. The scope of application for the new method is greatly extended by the observation that test data, or any new data to classify, is in fact additional, unlabelled data - thus, a combined learning/classification operation - much akin to what is done in image segmentation - can be invoked whenever there is new data to classify. Experiments with data sets from the UC Irvine database demonstrate that the new learning algorithms and structure achieve substantial performance gains over alternative approaches.

class label, classifier, unlabelled data, (13 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Pennsylvania > Centre County > University Park (0.04)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Meila, Marina, Jordan, Michael I.

Triangulation by Continuous Embedding

When triangulating a belief network we aim to obtain a junction tree of minimum state space. According to (Rose, 1970), searching for the optimal triangulation can be cast as a search over all the permutations of the graph's vertices. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. This paper presents two ways of embedding the triangulation problem into continuous domain and shows that they perform well compared to the best known heuristic.

graph, permutation, triangulation, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.06)
North America > United States > New York (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Ordered Classes and Incomplete Examples in Classification

Mathieson, Mark

The classes in classification tasks often have a natural ordering, and the training and testing examples are often incomplete. We propose a nonlinear ordinal model for classification into ordered classes. Predictive, simulation-based approaches are used to learn from past and classify future incomplete examples. These techniques are illustrated by making prognoses for patients who have suffered severe head injuries.

approximation, imputation, incomplete example, (13 more...)

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Lewicki, Michael S., Sejnowski, Terrence J.

Bayesian Unsupervised Learning of Higher Order Structure

Many real world patterns have a hierarchical underlying structure in which simple features have a higher order structure among themselves. Because these relationships are often statistical in nature, it is natural to view the process of discovering such structures as a statistical inference problem in which a hierarchical model is fit to data. Hierarchical statistical structure can be conveniently represented with Bayesian belief networks (Pearl, 1988; Lauritzen and Spiegelhalter, 1988; Neal, 1992). These 530 M. S. Lewicki and T. 1. Sejnowski models are powerful, because they can capture complex statistical relationships among the data variables, and also mathematically convenient, because they allow efficient computation of the joint probability for any given set of model parameters.

bayesian unsupervised learning, probability, representation, (12 more...)

Country:

North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)

Unification of Information Maximization and Minimization

Kamimura, Ryotaro

In the present paper, we propose a method to unify information maximization and minimization in hidden units. The information maximization and minimization are performed on two different levels: collective and individual level. Thus, two kinds of information: collective and individual information are defined. By maximizing collective information and by minimizing individual information, simple networks can be generated in terms of the number of connections and the number of hidden units. Obtained networks are expected to give better generalization and improved interpretation of internal representations.

individual information, information, information controller, (12 more...)

Country:

North America > United States > New York (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Jordan, Michael I., Ghahramani, Zoubin, Saul, Lawrence K.

Hidden Markov Decision Trees

We study a time series model that can be viewed as a decision tree with Markov temporal structure. The model is intractable for exact calculations, thus we utilize variational approximations. We consider three different distributions for the approximation: one in which the Markov calculations are performed exactly and the layers of the decision tree are decoupled, one in which the decision tree calculations are performed exactly and the time steps of the Markov chain are decoupled, and one in which a Viterbi-like assumption is made to pick out a single most likely state sequence.

approximation, decision tree, node, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.09)
North America > United States > California > San Mateo County > Redwood City (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Jaakkola, Tommi, Jordan, Michael I.

Recursive Algorithms for Approximating Probabilities in Graphical Models

We develop a recursive node-elimination formalism for efficiently approximating large probabilistic networks. No constraints are set on the network topologies. Yet the formalism can be straightforwardly integrated with exact methods whenever they are/become applicable. The approximations we use are controlled: they maintain consistently upper and lower bounds on the desired quantities at all times. We show that Boltzmann machines, sigmoid belief networks, or any combination (i.e., chain graphs) can be handled within the same framework.

boltzmann machine, partition function, recursion, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.08)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.38)

Fritsch, Jürgen, Finke, Michael, Waibel, Alex

Adaptively Growing Hierarchical Mixtures of Experts

We propose a novel approach to automatically growing and pruning Hierarchical Mixtures of Experts. The constructive algorithm proposed here enables large hierarchies consisting of several hundred experts to be trained effectively. We show that HME's trained by our automatic growing procedure yield better generalization performance than traditional static and balanced hierarchies. Evaluation of the algorithm is performed (1) on vowel classification and (2) within a hybrid version of the JANUS r9] speech recognition system using a subset of the Switchboard large-vocabulary speaker-independent continuous speech recognition database.

hierarchical mixture, hme, probability, (17 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)