AITopics

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)

Lippe, D., Alspector, Joshua

A Study of Parallel Perturbative Gradient Descent

Neural Information Processing SystemsDec-31-1995

Motivated by difficulties in analog VLSI implementation of back-propagation [Rumelhart et al., 1986] and related algorithms that calculate gradients based on detailed knowledge of the neural network model, there were several similar recent papers proposing to use a parallel [Alspector et al., 1993, Cauwenberghs, 1993, Kirk et al., 1993] or a semi-parallel [Flower and Jabri, 1993] perturbative technique which has the property that it measures (with the physical neural network) rather than calculates the gradient. This technique is closely related to methods of stochastic approximation [Kushner and Clark, 1978] which have been investigated recently by workers in fields other than neural networks.

multiple perturbation, parallel perturbation, perturbation, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)

Lippe, D., Alspector, Joshua

A Study of Parallel Perturbative Gradient Descent

Neural Information Processing SystemsDec-31-1995

Motivated by difficulties in analog VLSI implementation of back-propagation [Rumelhart et al., 1986] and related algorithms that calculate gradients based on detailed knowledge of the neural network model, there were several similar recent papersproposing to use a parallel [Alspector et al., 1993, Cauwenberghs, 1993, Kirk et al., 1993] or a semi-parallel [Flower and Jabri, 1993] perturbative technique which has the property that it measures (with the physical neural network) rather than calculates the gradient. This technique is closely related to methods of stochastic approximation[Kushner and Clark, 1978] which have been investigated recently by workers in fields other than neural networks.

artificial intelligence, machine learning, perturbation, (14 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)

A Learning Analog Neural Network Chip with Continuous-Time Recurrent Dynamics

Cauwenberghs, Gert

The recurrent network, containing six continuous-time analog neurons and 42 free parameters (connection strengths and thresholds), is trained to generate time-varying outputs approximating given periodic signals presented to the network. The chip implements a stochastic perturbative algorithm, which observes the error gradient along random directions in the parameter space for error-descent learning. In addition to the integrated learning functions and the generation of pseudo-random perturbations, the chip provides for teacher forcing and long-term storage of the volatile parameters. The network learns a 1 kHz circular trajectory in 100 sec. The chip occupies 2mm x 2mm in a 2JLm CMOS process, and dissipates 1.2 m W. 1 Introduction Exact gradient-descent algorithms for supervised learning in dynamic recurrent networks [1-3] are fairly complex and do not provide for a scalable implementation in a standard 2-D VLSI process. We have implemented a fairly simple and scalable ·Present address: Johns Hopkins University, ECE Dept., Baltimore MD 21218-2686.

artificial intelligence, learning analog neural network chip, machine learning, (12 more...)

Country:

North America > United States > Maryland > Baltimore (0.24)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
(2 more...)

Industry: Semiconductors & Electronics (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Das, Sreerupa, Mozer, Michael C.

A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction

Researchers often try to understand-post hoc-representations that emerge in the hidden layers of a neural net following training. Interpretation is difficult because these representations are typically highly distributed and continuous. By "continuous," we mean that if one constructed a scatterplot over the hidden unit activity space of patterns obtained in response to various inputs, examination at any scale would reveal the patterns to be broadly distributed over the space.

dolce, representation, unified gradient-descent clustering architecture, (12 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)

GDS: Gradient Descent Generation of Symbolic Classification Rules

Blasig, Reinhard

Given such a classification task in most cases it is not too difficult to devise a network architecture that is capable of learning the input-output relation as represented by a number of training examples.

gradient descent generation, interest rate, neuron, (15 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Orange County > Irvine (0.04)
(2 more...)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Bengio, Yoshua, Frasconi, Paolo

Credit Assignment through Time: Alternatives to Backpropagation

Learning to recognize or predict sequences using long-term context has many applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior to that obtained with backpropagation. 1 Introduction Recurrent neural networks have been considered to learn to map input sequences to output sequences. Machines that could efficiently learn such tasks would be useful for many applications involving sequence prediction, recognition or production. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. In fact, we can prove that dynamical systems such as recurrent neural networks will be increasingly difficult to train with gradient descent as the duration of the dependencies to be captured increases. A mathematical analysis of the problem shows that either one of two conditions arises in such systems.

algorithm, information, sequence, (13 more...)

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > Middle East > Jordan (0.05)
Europe > Italy (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.52)

GDS: Gradient Descent Generation of Symbolic Classification Rules

Blasig, Reinhard

Given such a classification task in most cases it is not too difficult to devise a network architecture that is capable of learning the input-output relation as represented by a number of training examples.

gradient descent generation, interest rate, neuron, (15 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Orange County > Irvine (0.04)
(2 more...)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

A Learning Analog Neural Network Chip with Continuous-Time Recurrent Dynamics

Cauwenberghs, Gert

The recurrent network, containing six continuous-time analog neurons and 42 free parameters (connection strengths and thresholds), is trained to generate time-varying outputs approximating given periodic signals presented to the network. The chip implements a stochastic perturbative algorithm, which observes the error gradient along random directions in the parameter space for error-descent learning. In addition to the integrated learning functions and the generation of pseudo-random perturbations, the chip provides for teacher forcing and long-term storage of the volatile parameters. The network learns a 1 kHz circular trajectory in 100 sec. The chip occupies 2mm x 2mm in a 2JLm CMOS process, and dissipates 1.2 m W. 1 Introduction Exact gradient-descent algorithms for supervised learning in dynamic recurrent networks [1-3] are fairly complex and do not provide for a scalable implementation in a standard 2-D VLSI process. We have implemented a fairly simple and scalable ·Present address: Johns Hopkins University, ECE Dept., Baltimore MD 21218-2686.

circuitry, learning analog neural network chip, perturbation, (10 more...)

Country:

North America > United States > Maryland > Baltimore (0.24)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
(2 more...)

Industry: Semiconductors & Electronics (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Bengio, Yoshua, Frasconi, Paolo

Credit Assignment through Time: Alternatives to Backpropagation

Learning to recognize or predict sequences using long-term context has many applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior to that obtained with backpropagation. 1 Introduction Recurrent neural networks have been considered to learn to map input sequences to output sequences. Machines that could efficiently learn such tasks would be useful for many applications involving sequence prediction, recognition or production. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. In fact, we can prove that dynamical systems such as recurrent neural networks will be increasingly difficult to train with gradient descent as the duration of the dependencies to be captured increases. A mathematical analysis of the problem shows that either one of two conditions arises in such systems.

algorithm, information, sequence, (13 more...)