Goto

Collaborating Authors

 Genre


Does the Wake-sleep Algorithm Produce Good Density Estimators?

Neural Information Processing Systems

The wake-sleep algorithm (Hinton, Dayan, Frey and Neal 1995) is a relatively efficientmethod of fitting a multilayer stochastic generative model to high-dimensional data. In addition to the top-down connections inthe generative model, it makes use of bottom-up connections for approximating the probability distribution over the hidden units given the data, and it trains these bottom-up connections using a simple delta rule. We use a variety of synthetic and real data sets to compare the performance ofthe wake-sleep algorithm with Monte Carlo and mean field methods for fitting the same generative model and also compare it with other models that are less powerful but easier to fit. 1 INTRODUCTION Neural networks are often used as bottom-up recognition devices that transform input vectors intorepresentations of those vectors in one or more hidden layers. But multilayer networks ofstochastic neurons can also be used as top-down generative models that produce patterns with complicated correlational structure in the bottom visible layer. In this paper we consider generative models composed of layers of stochastic binary logistic units. Given a generative model parameterized by top-down weights, there is an obvious way to perform unsupervised learning. The generative weights are adjusted to maximize the probability thatthe visible vectors generated by the model would match the observed data.


Using Unlabeled Data for Supervised Learning

Neural Information Processing Systems

Geoffrey Towell Siemens Corporate Research 755 College Road East Princeton, NJ 08540 Abstract Many classification problems have the property that the only costly part of obtaining examples is the class label. This paper suggests a simple method for using distribution information contained in unlabeled examples to augment labeled examples in a supervised training framework. Empirical tests show that the technique described inthis paper can significantly improve the accuracy of a supervised learner when the learner is well below its asymptotic accuracy level. 1 INTRODUCTION Supervised learning problems often have the following property: unlabeled examples have little or no cost while class labels have a high cost. For example, it is trivial to record hours of heartbeats from hundreds of patients. However, it is expensive to hire cardiologists to label each of the recorded beats.


Is Learning The n-th Thing Any Easier Than Learning The First?

Neural Information Processing Systems

This paper investigates learning in a lifelong context. Lifelong learning addresses situations in which a learner faces a whole stream of learning tasks.Such scenarios provide the opportunity to transfer knowledge across multiple learning tasks, in order to generalize more accurately from less training data. In this paper, several different approaches to lifelong learning are described, and applied in an object recognition domain. It is shown that across the board, lifelong learning approaches generalize consistently more accurately from less training data, by their ability to transfer knowledge across learning tasks. 1 Introduction Supervised learning is concerned with approximating an unknown function based on examples. Virtuallyall current approaches to supervised learning assume that one is given a set of input-output examples, denoted by X, which characterize an unknown function, denoted by f.


Generating Accurate and Diverse Members of a Neural-Network Ensemble

Neural Information Processing Systems

In particular, combining separately trained neural networks (commonly referred to as a neural-network ensemble) has been demonstrated to be particularly successful (Alpaydin, 1993; Drucker et al., 1994; Hansen and Salamon, 1990; Hashem et al., 1994; Krogh and Vedelsby, 1995; Maclin and Shavlik, 1995; Perrone, 1992). Both theoretical (Hansen and Salamon, 1990;Krogh and Vedelsby, 1995) and empirical (Hashem et al., 1994; 536 D.W. OPITZ, J. W. SHAVLIK Maclin and Shavlik, 1995) work has shown that a good ensemble is one where the individual networks are both accurate and make their errors on different parts of the input space; however, most previous work has either focussed on combining the output of multiple trained networks or only indirectly addressed how we should generate a good set of networks.


Using Pairs of Data-Points to Define Splits for Decision Trees

Neural Information Processing Systems

CART either split the data using axis-aligned hyperplanes or they perform a computationally expensivesearch in the continuous space of hyperplanes with unrestricted orientations. We show that the limitations of the former can be overcome without resorting to the latter. For every pair of training data-points, there is one hyperplane that is orthogonal tothe line joining the data-points and bisects this line. Such hyperplanes are plausible candidates for splits. In a comparison on a suite of 12 datasets we found that this method of generating candidate splits outperformed the standard methods, particularly when the training sets were small. 1 Introduction Binary decision trees come in many flavours, but they all rely on splitting the set of k-dimensional data-points at each internal node into two disjoint sets.


REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition

Neural Information Processing Systems

In this paper, we introduce REMAP, an approach for the training and estimation of posterior probabilities using a recursive algorithm that is reminiscent of the EMbased Forward-Backward (Liporace 1982) algorithm for the estimation of sequence likelihoods. Although verygeneral, the method is developed in the context of a statistical model for transition-based speech recognition using Artificial NeuralNetworks (ANN) to generate probabilities for Hidden Markov Models (HMMs). In the new approach, we use local conditional posterior probabilities of transitions to estimate global posterior probabilities of word sequences. Although we still use ANNs to estimate posterior probabilities, the network is trained with targets that are themselves estimates of local posterior probabilities. Aninitial experimental result shows a significant decrease in error-rate in comparison to a baseline system. 1 INTRODUCTION The ultimate goal in speech recognition is to determine the sequence of words that has been uttered.


Adaptive Back-Propagation in On-Line Learning of Multilayer Networks

Neural Information Processing Systems

This research has been motivated by the dominance of the suboptimal symmetric phase in online learning of two-layer feedforward networks trained by gradient descent [2]. This trapping is emphasized for inappropriate small learning rates but exists in all training scenarios, effecting the learning process considerably. We Adaptive Back-Propagation in Online Learning of Multilayer Networks 329 proposed an adaptive back-propagation training algorithm [Eq.


Human Reading and the Curse of Dimensionality

Neural Information Processing Systems

Whereas optical character recognition (OCR) systems learn to classify singlecharacters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images isreduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR)systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1) . OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).


The Innovative Applications of Artificial Intelligence Conference: Past and Future

AI Magazine

This article is a reflection on the goals and focus of the Innovative Applications of Artificial Intelligence (IAAI) Conference. The author begins with an historical review of the conference. He then goes on to discuss the role of the IAAI conference, including an examination of the relationship between AI scientific research and the application of AI technology. He concludes with a presentation of the new vision for the IAAI conference.