AITopics

The lateral connections learn the correlations of activity between units on the map. The resulting excitatory connections focus the activity into local patches and the inhibitory connections decorrelate redundant activity on the map. The map thus forms internal representations that are easy to recognize with e.g. a perceptron network. The recognition rate on a subset of NIST database 3 is 4.0% higher with LISSOM than with a regular Self-Organizing Map (SOM) as the front end, and 15.8% higher than recognition of raw input bitmaps directly. These results form a promising starting point for building pattern recognition systems with a LISSOM map as a front end. 1 Introduction Handwritten digit recognition has become one of the touchstone problems in neural networks recently.

lateral connection, lissom map, representation, (13 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.42)

Neuron-MOS Temporal Winner Search Hardware for Fully-Parallel Data Processing

Shibata, Tadashi, Nakai, Tsutomu, Morimoto, Tatsuo, Kaihara, Ryu, Yamashita, Takeo, Ohmi, Tadahiro

Search for the largest (or the smallest) among a number of input data, Le., the winner-take-all (WTA) action, is an essential part of intelligent data processing such as data retrieval in associative memories [3], vector quantization circuits [4], Kohonen's self-organizing maps [5] etc. In addition to the maximum or minimum search, data sorting also plays an essential role in a number of signal processing such as median filtering in image processing, evolutionary algorithms in optimizing problems [6] and so forth.

neuron-mo temporal winner search hardware, opération, test circuit, (12 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Industry: Information Technology > Software (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Frey, Brendan J., Hinton, Geoffrey E., Dayan, Peter

Does the Wake-sleep Algorithm Produce Good Density Estimators?

The wake-sleep algorithm (Hinton, Dayan, Frey and Neal 1995) is a relatively efficient method of fitting a multilayer stochastic generative model to high-dimensional data. In addition to the top-down connections in the generative model, it makes use of bottom-up connections for approximating the probability distribution over the hidden units given the data, and it trains these bottom-up connections using a simple delta rule. We use a variety of synthetic and real data sets to compare the performance of the wake-sleep algorithm with Monte Carlo and mean field methods for fitting the same generative model and also compare it with other models that are less powerful but easier to fit. 1 INTRODUCTION Neural networks are often used as bottom-up recognition devices that transform input vectors into representations of those vectors in one or more hidden layers. But multilayer networks of stochastic neurons can also be used as top-down generative models that produce patterns with complicated correlational structure in the bottom visible layer. In this paper we consider generative models composed of layers of stochastic binary logistic units. Given a generative model parameterized by top-down weights, there is an obvious way to perform unsupervised learning. The generative weights are adjusted to maximize the probability that the visible vectors generated by the model would match the observed data.

algorithm produce good density estimator, helmholtz machine, probability, (10 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Using Unlabeled Data for Supervised Learning

Towell, Geoffrey G.

For example, it is trivial to record hours of heartbeats from hundreds of patients. However, it is expensive to hire cardiologists to label each of the recorded beats. One response to the expense of class labels is to squeeze the most information possible out of each labeled example. Regularization and cross-validation both have this goal. A second response is to start with a small set of labeled examples and request labels of only those currently unlabeled examples that are expected to provide a significant improvement in the behavior of the classifier (Lewis & Catlett, 1994; Freund et al., 1993). A third response is to tap into a largely ignored potential source of information; namely, unlabeled examples. This response is supported by the theoretical work of Castelli and Cover (1995) which suggests that unlabeled examples have value in learning classification problems.

information, sulu, unlabeled example, (15 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.42)

Is Learning The n-th Thing Any Easier Than Learning The First?

Thrun, Sebastian

This paper investigates learning in a lifelong context. Lifelong learning addresses situations in which a learner faces a whole stream of learning tasks. Such scenarios provide the opportunity to transfer knowledge across multiple learning tasks, in order to generalize more accurately from less training data. In this paper, several different approaches to lifelong learning are described, and applied in an object recognition domain. It is shown that across the board, lifelong learning approaches generalize consistently more accurately from less training data, by their ability to transfer knowledge across learning tasks.

knowledge, neural network, representation, (15 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
Asia > Middle East > Israel (0.05)
(4 more...)

Genre:

Overview (0.74)
Research Report (0.54)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.77)

Schaal, Stefan, Atkeson, Christopher G.

From Isolation to Cooperation: An Alternative View of a System of Experts

We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do the experts cooperate by blending their individual predictions. Each expert is trained by minimizing a penalized local cross validation error using second order methods. In this way, an expert is able to find a local distance metric by adjusting the size and shape of the receptive field in which its predictions are valid, and also to detect relevant input features by adjusting its bias on the importance of individual input dimensions. We derive asymptotic results for our method. In a variety of simulations the properties of the algorithm are demonstrated with respect to interference, learning speed, prediction accuracy, feature detection, and task oriented incremental learning.

prediction, receptive field, ridge regression parameter, (14 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Jaakkola, Tommi, Saul, Lawrence K., Jordan, Michael I.

Fast Learning by Bounding Likelihoods in Sigmoid Type Belief Networks

Sigmoid type belief networks, a class of probabilistic neural networks, provide a natural framework for compactly representing probabilistic information in a variety of unsupervised and supervised learning problems. Often the parameters used in these networks need to be learned from examples. Unfortunately, estimating the parameters via exact probabilistic calculations (i.e, the EMalgorithm) is intractable even for networks with fairly small numbers of hidden units. We propose to avoid the infeasibility of the E step by bounding likelihoods instead of computing them exactly. We introduce extended and complementary representations for these networks and show that the estimation of the network parameters can be made fast (reduced to quadratic optimization) by performing the estimation in either of the alternative domains.

bounding likelihood, probability, representation, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Asia > Middle East > Jordan (0.06)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Hihi, Salah El, Bengio, Yoshua

Hierarchical Recurrent Neural Networks for Long-Term Dependencies

Learning long-term dependencies is not as difficult with NARX recurrent neural networks.

dependency, long-term dependency, time scale, (15 more...)

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Saul, Lawrence K., Jordan, Michael I.

Exploiting Tractable Substructures in Intractable Networks

We develop a refined mean field approximation for inference and learning in probabilistic neural networks. Our mean field theory, unlike most, does not assume that the units behave as independent degrees of freedom; instead, it exploits in a principled way the existence of large substructures that are computationally tractable. To illustrate the advantages of this framework, we show how to incorporate weak higher order interactions into a first-order hidden Markov model, treating the corrections (but not the first order structure) within mean field theory. 1 INTRODUCTION Learning the parameters in a probabilistic neural network may be viewed as a problem in statistical estimation.

approximation, exploiting tractable substructure, mean field approximation, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Asia > Middle East > Jordan (0.07)
North America > United States > Hawaii (0.04)
North America > United States > California > San Mateo County > Redwood City (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Ghahramani, Zoubin, Jordan, Michael I.

Factorial Hidden Markov Models

Due to the simplicity and efficiency of its parameter estimation algorithm, the hidden Markov model (HMM) has emerged as one of the basic statistical tools for modeling discrete time series, finding widespread application in the areas of speech recognition (Rabiner and Juang, 1986) and computational molecular biology (Baldi et al., 1994). An HMM is essentially a mixture model, encoding information about the history of a time series in the value of a single multinomial variable (the hidden state). This multinomial assumption allows an efficient parameter estimation algorithm to be derived (the Baum-Welch algorithm). However, it also severely limits the representational capacity of HMMs.

algorithm, markov model, state representation, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.07)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)