AITopics

We introduce and study methods of inserting synaptic noise into dynamically-driven recurrent neural networks and show that applying acontrolled amount of noise during training may improve convergence and generalization. In addition, we analyze the effects of each noise parameter (additive vs. multiplicative, cumulative vs. non-cumulative, per time step vs. per string) and predict that best overall performance can be achieved by injecting additive noise at each time step. Extensive simulations on learning the dual parity grammar from temporal strings substantiate these predictions.

artificial intelligence, neural network, noise, (14 more...)

Country: North America > United States > Maryland > Prince George's County > College Park (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Learning Many Related Tasks at the Same Time with Backpropagation

Caruana, Rich

Hinton [6] proposed that generalization in artificial neural nets should improve if nets learn to represent the domain's underlying regularities. Abu-Mustafa's hints work [1] shows that the outputs of a backprop net can be used as inputs through which domainspecific informationcan be given to the net. We extend these ideas by showing that a backprop net learning many related tasks at the same time can use these tasks as inductive bias for each other and thus learn better. We identify five mechanisms by which multitask backprop improves generalization and give empirical evidence that multitask backprop generalizes better in real domains.

artificial intelligence, backprop, neural network, (17 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.41)

Sperduti, Alessandro, Stork, David G.

A Rapid Graph-based Method for Arbitrary Transformation-Invariant Pattern Classification

We present a graph-based method for rapid, accurate search through prototypes for transformation-invariant pattern classification. Ourmethod has in theory the same recognition accuracy as other recent methods based on ''tangent distance" [Simard et al., 1994], since it uses the same categorization rule. Nevertheless ours is significantly faster during classification because far fewer tangent distancesneed be computed. Crucial to the success of our system are 1) a novel graph architecture in which transformation constraints and geometric relationships among prototypes are encoded duringlearning, and 2) an improved graph search criterion, used during classification. These architectural insights are applicable toa wide range of problem domains.

artificial intelligence, neural network, prototype, (16 more...)

Country: North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Sirosh, Joseph, Miikkulainen, Risto

Ocular Dominance and Patterned Lateral Connections in a Self-Organizing Model of the Primary Visual Cortex

Like cortical response properties, the connectivitypattern is highly plastic in early developmentand can be altered by experience (Katz and Callaway 1992).

health & medicine, neurology, neuron, (15 more...)

Country:

North America > United States > Texas (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Horne, Bill G., Giles, C. Lee

An experimental comparison of recurrent neural networks

Many different discrete-time recurrent neural network architectures havebeen proposed. However, there has been virtually no effort to compare these arch:tectures experimentally. In this paper we review and categorize many of these architectures and compare how they perform on various classes of simple problems including grammatical inference and nonlinear system identification.

deep learning, experiment, neural network, (17 more...)

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Overview (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Bishop, Chris M., Legleye, Claire

Estimating Conditional Probability Densities for Periodic Variables

In this paper we introduce three novel techniques for tackling such problems, and investigate their performance using syntheticdata. We then apply these techniques to the problem of extracting the distribution of wind vector directions from radar scatterometer data gathered by a remote-sensing satellite.

bayesian inference, conditional probability density, neural network, (19 more...)

Country:

North America > United States > California (0.14)
Europe > United Kingdom (0.14)

Industry: Energy > Renewable (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.53)

Cohn, David A., Ghahramani, Zoubin, Jordan, Michael I.

Active Learning with Statistical Models

For many types of learners one can compute the statistically "optimal" wayto select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992; Cohn, 1994] . We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression areboth efficient and accurate.

neural network, survey article, variance, (16 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Genre: Overview (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Waterhouse, Steve R., Robinson, Anthony J.

Non-linear Prediction of Acoustic Vectors Using Hierarchical Mixtures of Experts

We are concerned in this paper with the application of multiple models, specifically the Hierarchical Mixtures of Experts, to time series prediction, specifically the problem of predicting acoustic vectors for use in speech coding. There have been a number of applications of multiple models in time series prediction. A classic example is the Threshold Autoregressive model (TAR) which was used by Tong & 836 S. R. Waterhouse, A. J. Robinson Lim (1980) to predict sunspot activity. More recently, Lewis, Kay and Stevens (in Weigend & Gershenfeld (1994)) describe the use of Multivariate and Regression Splines (MARS) to the prediction of future values of currency exchange rates. Finally, in speech prediction, Cuperman & Gersho (1985) describe the Switched Inter-frame Vector Prediction (SIVP) method which switches between separate linear predictors trained on different statistical classes of speech.

artificial intelligence, machine learning, prediction, (15 more...)

Country:

Europe > United Kingdom (0.28)
North America > United States > Colorado > Boulder County > Boulder (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Visual Speech Recognition with Stochastic Networks

Movellan, Javier R.

This paper presents ongoing work on a speaker independent visual speech recognition system. The work presented here builds on previous research efforts in this area and explores the potential use of simple hidden Markov models for limited vocabulary, speaker independent visual speech recognition. The task at hand is recognition of the first four English digits, a task with possible applications in car-phone dialing. The images were modeled as mixtures of independent Gaussian distributions, and the temporal dependencies were captured with standard left-to-right hidden Markov models. The results indicate that simple hidden Markov models may be used to successfully recognize relatively unprocessed image sequences.

artificial intelligence, machine learning, speech recognition, (16 more...)

Country: North America > United States > California > San Diego County (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Zhao, Ying, Schwartz, Richard M., Sroka, Jason J., Makhoul, John

Hierarchical Mixtures of Experts Methodology Applied to Continuous Speech Recognition

In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan [1], into an HMMbased continuousspeech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each "expert" or neural net performs a "soft" decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,OOO-word and 4O,OOO-word Wall Street Journal corpus using HME models. 1 Introduction Recent research has shown that a continuous-density HMM (CD-HMM) system can outperform amore constrained tied-mixture HMM system for large-vocabulary continuous speech recognition (CSR) when a large amount of training data is available [2]. In other work, the utility of decision trees has been demonstrated in classification problems by using the "divide and conquer" paradigm effectively, where a problem is divided into a hierarchical set of simpler problems.

decision tree learning, hmm system, neural network, (11 more...)