AITopics

The wake-sleep algorithm (Hinton, Dayan, Frey and Neal 1995) is a relatively efficientmethod of fitting a multilayer stochastic generative model to high-dimensional data. In addition to the top-down connections inthe generative model, it makes use of bottom-up connections for approximating the probability distribution over the hidden units given the data, and it trains these bottom-up connections using a simple delta rule. We use a variety of synthetic and real data sets to compare the performance ofthe wake-sleep algorithm with Monte Carlo and mean field methods for fitting the same generative model and also compare it with other models that are less powerful but easier to fit. 1 INTRODUCTION Neural networks are often used as bottom-up recognition devices that transform input vectors intorepresentations of those vectors in one or more hidden layers. But multilayer networks ofstochastic neurons can also be used as top-down generative models that produce patterns with complicated correlational structure in the bottom visible layer. In this paper we consider generative models composed of layers of stochastic binary logistic units. Given a generative model parameterized by top-down weights, there is an obvious way to perform unsupervised learning. The generative weights are adjusted to maximize the probability thatthe visible vectors generated by the model would match the observed data.

artificial intelligence, helmholtz machine, machine learning, (13 more...)

Country:

North America > United States > Massachusetts (0.28)
North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)

Using Unlabeled Data for Supervised Learning

Towell, Geoffrey G.

Geoffrey Towell Siemens Corporate Research 755 College Road East Princeton, NJ 08540 Abstract Many classification problems have the property that the only costly part of obtaining examples is the class label. This paper suggests a simple method for using distribution information contained in unlabeled examples to augment labeled examples in a supervised training framework. Empirical tests show that the technique described inthis paper can significantly improve the accuracy of a supervised learner when the learner is well below its asymptotic accuracy level. 1 INTRODUCTION Supervised learning problems often have the following property: unlabeled examples have little or no cost while class labels have a high cost. For example, it is trivial to record hours of heartbeats from hundreds of patients. However, it is expensive to hire cardiologists to label each of the recorded beats.

artificial intelligence, inductive learning, machine learning, (19 more...)

Country: North America > United States > New Jersey > Mercer County > Princeton (0.24)

Genre: Research Report > New Finding (0.69)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.89)

Is Learning The n-th Thing Any Easier Than Learning The First?

Thrun, Sebastian

This paper investigates learning in a lifelong context. Lifelong learning addresses situations in which a learner faces a whole stream of learning tasks.Such scenarios provide the opportunity to transfer knowledge across multiple learning tasks, in order to generalize more accurately from less training data. In this paper, several different approaches to lifelong learning are described, and applied in an object recognition domain. It is shown that across the board, lifelong learning approaches generalize consistently more accurately from less training data, by their ability to transfer knowledge across learning tasks. 1 Introduction Supervised learning is concerned with approximating an unknown function based on examples. Virtuallyall current approaches to supervised learning assume that one is given a set of input-output examples, denoted by X, which characterize an unknown function, denoted by f.

artificial intelligence, knowledge, machine learning, (17 more...)

Country: North America > United States > California (0.15)

Genre:

Overview (0.74)
Research Report (0.54)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Opitz, David W., Shavlik, Jude W.

Generating Accurate and Diverse Members of a Neural-Network Ensemble

In particular, combining separately trained neural networks (commonly referred to as a neural-network ensemble) has been demonstrated to be particularly successful (Alpaydin, 1993; Drucker et al., 1994; Hansen and Salamon, 1990; Hashem et al., 1994; Krogh and Vedelsby, 1995; Maclin and Shavlik, 1995; Perrone, 1992). Both theoretical (Hansen and Salamon, 1990;Krogh and Vedelsby, 1995) and empirical (Hashem et al., 1994; 536 D.W. OPITZ, J. W. SHAVLIK Maclin and Shavlik, 1995) work has shown that a good ensemble is one where the individual networks are both accurate and make their errors on different parts of the input space; however, most previous work has either focussed on combining the output of multiple trained networks or only indirectly addressed how we should generate a good set of networks.

artificial intelligence, ensemble, machine learning, (16 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.15)
North America > United States > Minnesota > St. Louis County > Duluth (0.14)
North America > United States > Minnesota > Saint Louis County > Duluth (0.14)

Genre: Research Report (0.69)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Hinton, Geoffrey E., Revow, Michael

Using Pairs of Data-Points to Define Splits for Decision Trees

CART either split the data using axis-aligned hyperplanes or they perform a computationally expensivesearch in the continuous space of hyperplanes with unrestricted orientations. We show that the limitations of the former can be overcome without resorting to the latter. For every pair of training data-points, there is one hyperplane that is orthogonal tothe line joining the data-points and bisects this line. Such hyperplanes are plausible candidates for splits. In a comparison on a suite of 12 datasets we found that this method of generating candidate splits outperformed the standard methods, particularly when the training sets were small. 1 Introduction Binary decision trees come in many flavours, but they all rely on splitting the set of k-dimensional data-points at each internal node into two disjoint sets.

artificial intelligence, decision tree learning, machine learning, (19 more...)

Country: North America > Canada > Ontario > Toronto (0.16)

Genre: Research Report (0.48)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.63)

Konig, Yochai, Bourlard, Hervé, Morgan, Nelson

REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition

In this paper, we introduce REMAP, an approach for the training and estimation of posterior probabilities using a recursive algorithm that is reminiscent of the EMbased Forward-Backward (Liporace 1982) algorithm for the estimation of sequence likelihoods. Although verygeneral, the method is developed in the context of a statistical model for transition-based speech recognition using Artificial NeuralNetworks (ANN) to generate probabilities for Hidden Markov Models (HMMs). In the new approach, we use local conditional posterior probabilities of transitions to estimate global posterior probabilities of word sequences. Although we still use ANNs to estimate posterior probabilities, the network is trained with targets that are themselves estimates of local posterior probabilities. Aninitial experimental result shows a significant decrease in error-rate in comparison to a baseline system. 1 INTRODUCTION The ultimate goal in speech recognition is to determine the sequence of words that has been uttered.

artificial intelligence, machine learning, probability, (13 more...)

Country: North America > United States (0.29)

Genre: Research Report > New Finding (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

West, Ansgar H. L., Saad, David

Adaptive Back-Propagation in On-Line Learning of Multilayer Networks

This research has been motivated by the dominance of the suboptimal symmetric phase in online learning of two-layer feedforward networks trained by gradient descent [2]. This trapping is emphasized for inappropriate small learning rates but exists in all training scenarios, effecting the learning process considerably. We Adaptive Back-Propagation in Online Learning of Multilayer Networks 329 proposed an adaptive back-propagation training algorithm [Eq.

artificial intelligence, gradient descent, machine learning, (12 more...)

Genre: Instructional Material > Online (0.40)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Human Reading and the Curse of Dimensionality

Martin, Gale

Whereas optical character recognition (OCR) systems learn to classify singlecharacters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images isreduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR)systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1) . OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).

artificial intelligence, machine learning, optical character recognition, (16 more...)

Country: North America > United States (0.16)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

The Innovative Applications of Artificial Intelligence Conference: Past and Future

Shrobe, Howard

AI MagazineDec-15-1996

This article is a reflection on the goals and focus of the Innovative Applications of Artificial Intelligence (IAAI) Conference. The author begins with an historical review of the conference. He then goes on to discuss the role of the IAAI conference, including an examination of the relationship between AI scientific research and the application of AI technology. He concludes with a presentation of the new vision for the IAAI conference.

application, upstream oil & gas, us government, (17 more...)

AI Magazine

Country:

North America > United States > Oregon (0.14)
North America > United States > California (0.14)
Asia (0.14)

Genre: Overview > Innovation (0.71)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.46)