AITopics

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
(2 more...)

Industry: Semiconductors & Electronics (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)

LeCun, Yann, Simard, Patrice Y., Pearlmutter, Barak

Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors

We propose a very simple, and well principled way of computing the optimal step size in gradient descent algorithms. The online version is very efficient computationally, and is applicable to large backpropagation networks trained on large data sets. The main ingredient is a technique for estimating the principal eigenvalue(s) and eigenvector(s) of the objective function's second derivative matrix (Hessian), which does not require to even calculate the Hessian. Several other applications of this technique are proposed for speeding up learning, or for eliminating useless parameters. 1 INTRODUCTION Choosing the appropriate learning rate, or step size, in a gradient descent procedure such as backpropagation, is simultaneously one of the most crucial and expertintensive part of neural-network learning. We propose a method for computing the best step size which is both well-principled, simple, very cheap computationally, and, most of all, applicable to online training with large networks and data sets.

artificial intelligence, eigenvalue, machine learning, (13 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Colorado > Denver County > Denver (0.05)
(2 more...)

Industry: Education > Educational Setting > Online (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)

Orr, Genevieve B., Leen, Todd K.

Weight Space Probability Densities in Stochastic Learning: II. Transients and Basin Hopping Times

In stochastic learning, weights are random variables whose time evolution is governed by a Markov process. We summarize the theory of the time evolution of P, and give graphical examples of the time evolution that contrast the behavior of stochastic learning with true gradient descent (batch learning). Finally, we use the formalism to obtain predictions of the time required for noise-induced hopping between basins of different optima. We compare the theoretical predictions with simulations of large ensembles of networks for simple problems in supervised and unsupervised learning. Despite the recent application of convergence theorems from stochastic approximation theory to neural network learning (Oja 1982, White 1989) there remain outstanding questions about the search dynamics in stochastic learning.

evolution, learning, time evolution, (10 more...)

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > Oregon > Washington County > Beaverton (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Kirk, David B., Kerns, Douglas, Fleischer, Kurt, Barr, Alan H.

Analog VLSI Implementation of Multi-dimensional Gradient Descent

The implementation uses noise injection and multiplicative correlation to estimate derivatives, as in [Anderson, Kerns 92]. One intended application of this technique is setting circuit parameters on-chip automatically, rather than manually [Kirk 91]. Gradient descent optimization may be used to adjust synapse weights for a backpropagation or other on-chip learning implementation. The approach combines the features of continuous multidimensional gradient descent and the potential for an annealing style of optimization. We present data measured from our analog VLSI implementation. 1 Introduction This work is similar to [Anderson, Kerns 92], but represents two advances. First, we describe the extension of the technique to multiple dimensions. Second, we demonstrate an implementation of the multidimensional technique in analog VLSI, and provide results measured from the chip. Unlike previous work using noise sources in adaptive systems, we use the noise as a means of estimating the gradient of a function f(y), rather than performing an annealing process [Alspector 88]. We also estimate gr-;:dients continuously in position and time, in contrast to [Umminger 89] and [J abri 91], which utilize discrete position gradient estimates.

analog vlsi implementation, gradient descent, gradient estimate, (11 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Industry: Semiconductors & Electronics (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Orr, Genevieve B., Leen, Todd K.

Weight Space Probability Densities in Stochastic Learning: II. Transients and Basin Hopping Times

In stochastic learning, weights are random variables whose time evolution is governed by a Markov process. We summarize the theory of the time evolution of P, and give graphical examples of the time evolution that contrast the behavior of stochastic learning with true gradient descent (batch learning). Finally, we use the formalism to obtain predictions of the time required for noise-induced hopping between basins of different optima. We compare the theoretical predictions with simulations of large ensembles of networks for simple problems in supervised and unsupervised learning. Despite the recent application of convergence theorems from stochastic approximation theory to neural network learning (Oja 1982, White 1989) there remain outstanding questions about the search dynamics in stochastic learning.

evolution, learning, time evolution, (10 more...)

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > Oregon > Washington County > Beaverton (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

A Parallel Gradient Descent Method for Learning in Analog VLSI Neural Networks

Alspector, J., Meir, R., Yuhas, B., Jayakumar, A., Lippe, D.

Typical methods for gradient descent in neural network learning involve calculation of derivatives based on a detailed knowledge of the network model. This requires extensive, time consuming calculations for each pattern presentation and high precision that makes it difficult to implement in VLSI. We present here a perturbation technique that measures, not calculates, the gradient. Since the technique uses the actual network as a measuring device, errors in modeling neuron activation and synaptic weights do not cause errors in gradient descent. The method is parallel in nature and easy to implement in VLSI. We describe the theory of such an algorithm, an analysis of its domain of applicability, some simulations using it and an outline of a hardware implementation.

learning, parallel gradient descent method, perturbation, (12 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
(2 more...)

Industry: Semiconductors & Electronics (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)

Kirk, David B., Kerns, Douglas, Fleischer, Kurt, Barr, Alan H.

Analog VLSI Implementation of Multi-dimensional Gradient Descent

The implementation uses noise injection and multiplicative correlation to estimate derivatives, as in [Anderson, Kerns 92]. One intended application of this technique is setting circuit parameters on-chip automatically, rather than manually [Kirk 91]. Gradient descent optimization may be used to adjust synapse weights for a backpropagation or other on-chip learning implementation. The approach combines the features of continuous multidimensional gradient descent and the potential for an annealing style of optimization. We present data measured from our analog VLSI implementation. 1 Introduction This work is similar to [Anderson, Kerns 92], but represents two advances. First, we describe the extension of the technique to multiple dimensions. Second, we demonstrate an implementation of the multidimensional technique in analog VLSI, and provide results measured from the chip. Unlike previous work using noise sources in adaptive systems, we use the noise as a means of estimating the gradient of a function f(y), rather than performing an annealing process [Alspector 88]. We also estimate gr-;:dients continuously in position and time, in contrast to [Umminger 89] and [J abri 91], which utilize discrete position gradient estimates.

analog vlsi implementation, gradient descent, gradient estimate, (11 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Industry: Semiconductors & Electronics (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

LeCun, Yann, Simard, Patrice Y., Pearlmutter, Barak

Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors

We propose a very simple, and well principled way of computing the optimal step size in gradient descent algorithms. The online version is very efficient computationally, and is applicable to large backpropagation networks trained on large data sets. The main ingredient is a technique for estimating the principal eigenvalue(s) and eigenvector(s) of the objective function's second derivative matrix (Hessian), which does not require to even calculate the Hessian. Several other applications of this technique are proposed for speeding up learning, or for eliminating useless parameters. 1 INTRODUCTION Choosing the appropriate learning rate, or step size, in a gradient descent procedure such as backpropagation, is simultaneously one of the most crucial and expertintensive part of neural-network learning. We propose a method for computing the best step size which is both well-principled, simple, very cheap computationally, and, most of all, applicable to online training with large networks and data sets.

automatic learning rate maximization, eigenvalue, hessian, (11 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Colorado > Denver County > Denver (0.05)
(2 more...)

Industry: Education > Educational Setting > Online (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)

A Parallel Gradient Descent Method for Learning in Analog VLSI Neural Networks

Alspector, J., Meir, R., Yuhas, B., Jayakumar, A., Lippe, D.

Typical methods for gradient descent in neural network learning involve calculation of derivatives based on a detailed knowledge of the network model. This requires extensive, time consuming calculations for each pattern presentationand high precision that makes it difficult to implement in VLSI. We present here a perturbation technique that measures, not calculates, the gradient. Since the technique uses the actual network as a measuring device, errors in modeling neuron activation and synaptic weights do not cause errors in gradient descent. The method is parallel in nature and easy to implement in VLSI.

artificial intelligence, machine learning, perturbation, (13 more...)

Country:

North America > United States (0.29)
Asia > Middle East > Israel (0.14)

Industry: Semiconductors & Electronics (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Kirk, David B., Kerns, Douglas, Fleischer, Kurt, Barr, Alan H.

Analog VLSI Implementation of Multi-dimensional Gradient Descent

The implementation uses noise injection and multiplicative correlation to estimate derivatives, as in [Anderson, Kerns 92]. One intended application of this technique is setting circuit parameters on-chip automatically, rather than manually [Kirk 91]. Gradient descent optimization may be used to adjust synapse weights for a backpropagation or other on-chip learning implementation. The approach combines the features of continuous multidimensional gradient descent and the potential for an annealing style of optimization. We present data measured from our analog VLSI implementation. 1 Introduction This work is similar to [Anderson, Kerns 92], but represents two advances. First, we describe the extension of the technique to multiple dimensions. Second, we demonstrate animplementation of the multidimensional technique in analog VLSI, and provide results measured from the chip. Unlike previous work using noise sources in adaptive systems, we use the noise as a means of estimating the gradient of a function f(y), rather than performing an annealing process [Alspector 88]. We also estimate gr-;:dients continuously in position and time, in contrast to [Umminger 89] and [J abri 91], which utilize discrete position gradient estimates.

analog vlsi implementation, artificial intelligence, machine learning, (13 more...)

Country: North America > United States > California (0.49)

Industry: Semiconductors & Electronics (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)