AITopics

Learning to recognize or predict sequences using long-term context hasmany applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasksin which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior tothat obtained with backpropagation. 1 Introduction Recurrent neural networks have been considered to learn to map input sequences to output sequences. Machines that could efficiently learn such tasks would be useful for many applications involving sequence prediction, recognition or production.

algorithm, artificial intelligence, neural network, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.62)

Wang, Changfeng, Venkatesh, Santosh S., Judd, J. Stephen

Optimal Stopping and Effective Machine Complexity in Learning

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.

artificial intelligence, generalization error, neural network, (14 more...)

Country: North America > United States (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Leen, Todd K., Orr, Genevieve B.

Optimal Stochastic Search and Adaptive Momentum

Stochastic optimization algorithms typically use learning rate schedules that behave asymptotically as J.t(t)

artificial intelligence, momentum, optimization problem, (18 more...)

Country: North America > United States > California (0.29)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P.

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problemsinvolving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DPbased learning algorithms to the powerful techniquesof stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD("\) and Q-Iearning belong. 1 INTRODUCTION Learning to predict the future and to find an optimal way of controlling it are the basic goals of learning systems that interact with their environment. A variety of algorithms are currently being studied for the purposes of prediction and control in incompletely specified, stochastic environments. Here we consider learning algorithms definedin Markov environments. There are actions or controls (u) available for the learner that affect both the state transition probabilities, and the probability distributionfor the immediate, state dependent costs (Ci( u)) incurred by the learner.

algorithm, artificial intelligence, optimization problem, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.73)

Ron, Dana, Singer, Yoram, Tishby, Naftali

The Power of Amnesia

We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate theprocess, whereas on large scales, more syntactic and semantic informationis carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory modelsuniformly is not practical even for as short memory as four.

algorithm, artificial intelligence, automaton, (15 more...)

Country: Asia > Middle East > Israel (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Boyan, Justin A., Littman, Michael L.

Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach

"Q-routing" algorithm, related to certain distributed packet routing algorithms

algorithm, artificial intelligence, télécommunications, (18 more...)

Country: North America > United States (0.14)

Industry: Telecommunications > Networks (0.59)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Garzon, Max H., Botelho, Fernanda

Stability and Observability

We present a class of feedback control functions which accelerate convergence ratesof autonomous nonlinear dynamical systems such as neural network models, without affecting the basic convergence properties (e.g.

artificial intelligence, neural network, perturbation, (12 more...)

Country: North America > United States (0.16)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Oliveira, Arlindo L., Sangiovanni-Vincentelli, Alberto

Learning Complex Boolean Functions: Algorithms and Applications

The most commonly used neural network models are not well suited to direct digital implementations because each node needs to perform alarge number of operations between floating point values. Fortunately, the ability to learn from examples and to generalize is not restricted to networks ofthis type. Indeed, networks where each node implements a simple Boolean function (Boolean networks) can be designed in such a way as to exhibit similar properties. Two algorithms that generate Boolean networks from examples are presented. Theresults show that these algorithms generalize very well in a class of problems that accept compact Boolean network descriptions.

algorithm, logic programming, neural network, (17 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)

Robust Parameter Estimation and Model Selection for Neural Network Regression

Liu, Yong

In this paper, it is shown that the conventional back-propagation (BPP) algorithm for neural network regression is robust to leverages (datawith:n corrupted), but not to outliers (data with y corrupted). A robust model is to model the error as a mixture of normal distribution. The influence function for this mixture model is calculated and the condition for the model to be robust to outliers is given. EM algorithm [5] is used to estimate the parameter. The usefulness of model selection criteria is also discussed.

artificial intelligence, neural network, outlier, (13 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Learning in Compositional Hierarchies: Inducing the Structure of Objects from Data

Utans, Joachim

I propose a learning algorithm for learning hierarchical models for object recognition.The model architecture is a compositional hierarchy that represents part-whole relationships: parts are described in the local contextof substructures of the object. The focus of this report is learning hierarchical models from data, i.e. inducing the structure of model prototypes from observed exemplars of an object. At each node in the hierarchy, a probability distribution governing its parameters must be learned. The connections between nodes reflects the structure of the object. The formulation of substructures is encouraged such that their parts become conditionally independent.

bayesian inference, neural network, node, (18 more...)

Country: North America > United States > California (0.29)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)