AITopics

Backpropagation learning algorithms typically collapse the network's structure into a single vector of weight parameters to be optimized. We suggest that their performance may be improved by utilizing the structural information instead of discarding it, and introduce a framework for ''tempering'' each weight accordingly. In the tempering model, activation and error signals are treated as approximately independent random variables. The characteristic scale of weight changes is then matched to that ofthe residuals, allowing structural properties such as a node's fan-in and fan-out to affect the local learning rate and backpropagated error. The model also permits calculation of an upper bound on the global learning rate for batch updates, which in turn leads to different update rules for bias vs. non-bias weights. This approach yields hitherto unparalleled performance on the family relations benchmark, a deep multi-layer network: for both batch learning with momentum and the delta-bar-delta algorithm, convergence at the optimal learning rate is sped up by more than an order of magnitude.

learning rate, neural network, survey article, (16 more...)

Country: North America > United States (0.69)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.65)

Snapp, Robert R., Xu, Tong

Estimating the Bayes Risk from Sample Data

In this setting, each pattern, represented as an n-dimensional feature vector, is associated with a discrete pattern class, or state of nature (Duda and Hart, 1973). Using available information, (e.g., a statistically representative set of labeled feature vectors

artificial intelligence, bayes risk, machine learning, (16 more...)

Country: North America > United States > Vermont > Chittenden County > Burlington (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Saul, Lawrence K., Jordan, Michael I.

Exploiting Tractable Substructures in Intractable Networks

We develop a refined mean field approximation for inference and learning in probabilistic neural networks. Our mean field theory, unlike most, does not assume that the units behave as independent degrees of freedom; instead, it exploits in a principled way the existence of large substructures that are computationally tractable. To illustrate the advantages of this framework, we show how to incorporate weak higher order interactions into a first-order hidden Markov model, treating the corrections (but not the first order structure) within mean field theory. 1 INTRODUCTION Learning the parameters in a probabilistic neural network may be viewed as a problem in statistical estimation.

approximation, artificial intelligence, machine learning, (13 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Crites, Robert H., Barto, Andrew G.

Improving Elevator Performance Using Reinforcement Learning

This paper describes the application of reinforcement learning (RL) to the difficult real world problem of elevator dispatching. The elevator domain poses a combination of challenges not seen in most RL research to date. Elevator systems operate in continuous state spaces and in continuous time as discrete event dynamic systems. Their states are not fully observable and they are nonstationary due to changing passenger arrival rates. In addition, we use a team of RL agents, each of which is responsible for controlling one elevator car.

algorithm, artificial intelligence, passenger, (16 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.16)

Industry: Transportation > Passenger (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Sollich, Peter, Krogh, Anders

Learning with ensembles: How overfitting can be useful

We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data. 1 INTRODUCTION An ensemble is a collection of a (finite) number of neural networks or other types of predictors that are trained for the same task.

artificial intelligence, neural network, student, (19 more...)

Country: Europe > Denmark (0.14)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

Finite State Automata that Recurrent Cascade-Correlation Cannot Represent

Kremer, Stefan C.

This paper relates the computational power of Fahlman' s Recurrent Cascade Correlation (RCC) architecture to that of fInite state automata (FSA). While some recurrent networks are FSA equivalent, RCC is not. The paper presents a theoretical analysis of the RCC architecture in the form of a proof describing a large class of FSA which cannot be realized by RCC. 1 INTRODUCTION Recurrent networks can be considered to be defmed by two components: a network architecture, and a learning rule. The former describes how a network with a given set of weights and topology computes its output values, while the latter describes how the weights (and possibly topology) of the network are updated to fIt a specifIc problem. It is possible to evaluate the computational power of a network architecture by analyzing the types of computations a network could perform assuming appropriate connection weights (and topology).

artificial intelligence, neural network, oscillate, (16 more...)

Country:

North America > Canada > Alberta (0.28)
North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Ghahramani, Zoubin, Jordan, Michael I.

Factorial Hidden Markov Models

Due to the simplicity and efficiency of its parameter estimation algorithm, the hidden Markov model (HMM) has emerged as one of the basic statistical tools for modeling discrete time series, finding widespread application in the areas of speech recognition (Rabiner and Juang, 1986) and computational molecular biology (Baldi et al., 1994). An HMM is essentially a mixture model, encoding information about the history of a time series in the value of a single multinomial variable (the hidden state). This multinomial assumption allows an efficient parameter estimation algorithm to be derived (the Baum-Welch algorithm). However, it also severely limits the representational capacity of HMMs.

artificial intelligence, health & medicine, markov model, (18 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Howse, James W., Abdallah, Chaouki T., Heileman, Gregory L.

Gradient and Hamiltonian Dynamics Applied to Learning in Neural Networks

James W. Howse Chaouki T. Abdallah Gregory L. Heileman Department of Electrical and Computer Engineering University of New Mexico Albuquerque, NM 87131 Abstract The process of machine learning can be considered in two stages: model selection and parameter estimation. In this paper a technique is presented for constructing dynamical systems with desired qualitative properties. The approach is based on the fact that an n-dimensional nonlinear dynamical system can be decomposed into one gradient and (n - 1) Hamiltonian systems. Thus, the model selection stage consists of choosing the gradient and Hamiltonian portions appropriately so that a certain behavior is obtainable. To estimate the parameters, a stably convergent learning rule is presented.

artificial intelligence, neural network, trajectory, (14 more...)

Country: North America > United States > New Mexico > Bernalillo County > Albuquerque (0.24)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Baldi, Pierre, Hornik, Kurt

Universal Approximation and Learning of Trajectories Using Oscillators

Natural and artificial neural circuits must be capable of traversing specific state space trajectories. A natural approach to this problem is to learn the relevant trajectories from examples. Unfortunately, gradient descent learning of complex trajectories in amorphous networks is unsuccessful. We suggest a possible approach where trajectories are realized by combining simple oscillators, in various modular ways. We contrast two regimes of fast and slow oscillations. In all cases, we show that banks of oscillators with bounded frequencies have universal approximation properties. Open questions are also discussed briefly.

artificial intelligence, neural network, trajectory, (14 more...)

Country: North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

SEEMORE: A View-Based Approach to 3-D Object Recognition Using Multiple Visual Cues

Mel, Bartlett W.

In natural contexts, visual object recognition in humans is remarkably fast, reliable, and viewpoint invariant. The present approach to object recognition is "view-based" (e.g.

artificial intelligence, neural network, seemore, (15 more...)