AITopics

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > Canada > Ontario > Toronto (0.14)

Industry: Education > Educational Setting > Online (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)

Neural Information Processing SystemsDec-31-1993

Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors

LeCun, Yann, Simard, Patrice Y., Pearlmutter, Barak

We propose a very simple, and well principled way of computing the optimal step size in gradient descent algorithms. The online version is very efficient computationally, and is applicable to large backpropagation networks trained on large data sets. The main ingredient is a technique for estimating the principal eigenvalue(s) and eigenvector(s) of the objective function's second derivative matrix (Hessian), which does not require to even calculate the Hessian. Several other applications of this technique are proposed for speeding up learning, or for eliminating useless parameters. 1 INTRODUCTION Choosing the appropriate learning rate, or step size, in a gradient descent procedure such as backpropagation, is simultaneously one of the most crucial and expertintensive part of neural-network learning. We propose a method for computing the best step size which is both well-principled, simple, very cheap computationally, and, most of all, applicable to online training with large networks and data sets.

artificial intelligence, eigenvalue, neural network, (14 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > Canada > Ontario > Toronto (0.14)

Industry: Education > Educational Setting > Online (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)

Neural Information Processing SystemsDec-31-1993

Efficient Pattern Recognition Using a New Transformation Distance

Simard, Patrice, LeCun, Yann, Denker, John S.

Memory-based classification algorithms such as radial basis functions orK-nearest neighbors typically rely on simple distances (Euclidean, dotproduct ...), which are not particularly meaningful on pattern vectors. More complex, better suited distance measures are often expensive and rather ad-hoc (elastic matching, deformable templates). We propose a new distance measure which (a) can be made locally invariant to any set of transformations of the input and (b) can be computed efficiently. We tested the method on large handwritten character databases provided by the Post Office and the NIST. Using invariances with respect to translation, rotation, scaling,shearing and line thickness, the method consistently outperformed all other systems tested on the same databases.

artificial intelligence, post office, tangent distance, (18 more...)

Country: North America > United States (0.94)

Industry:

Government > Post Office (0.66)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Neural Information Processing SystemsDec-31-1993

Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors

LeCun, Yann, Simard, Patrice Y., Pearlmutter, Barak

Inst., 19600 NW vonNeumann Dr, Beaverton, OR 97006 Abstract We propose a very simple, and well principled way of computing the optimal step size in gradient descent algorithms. The online version is very efficient computationally, and is applicable to large backpropagation networks trained on large data sets. The main ingredient is a technique for estimating the principal eigenvalue(s) and eigenvector(s) of the objective function's second derivative matrix (Hessian),which does not require to even calculate the Hessian. Severalother applications of this technique are proposed for speeding up learning, or for eliminating useless parameters. 1 INTRODUCTION Choosing the appropriate learning rate, or step size, in a gradient descent procedure such as backpropagation, is simultaneously one of the most crucial and expertintensive partof neural-network learning. We propose a method for computing the best step size which is both well-principled, simple, very cheap computationally, and, most of all, applicable to online training with large networks and data sets.

artificial intelligence, hessian, neural network, (14 more...)

Country:

North America > United States > Oregon > Washington County > Beaverton (0.24)
North America > United States > Massachusetts > Hampshire County (0.14)

Industry: Education (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)

Tangent Prop - A formalism for specifying selected invariances in an adaptive network

Simard, Patrice, Victorri, Bernard, LeCun, Yann, Denker, John

In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired behavior of the system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spatial distortions of the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the derivative of its outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform. 1 INTRODUCTION In machine learning, one very often knows more about the function to be learned than just the training data. An interesting case is when certain directional derivatives of the desired function are known at certain points.

artificial intelligence, neural network, tangent vector, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Multi-Digit Recognition Using a Space Displacement Neural Network

Matan, Ofer, Burges, Christopher J. C., LeCun, Yann, Denker, John S.

This is an extension of previous work on recognizing isolated digits.

artificial intelligence, neural network, recognition, (14 more...)

Country: North America > United States (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Tangent Prop - A formalism for specifying selected invariances in an adaptive network

Simard, Patrice, Victorri, Bernard, LeCun, Yann, Denker, John

In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired behavior ofthe system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spatial distortionsof the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the derivative ofits outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform. 1 INTRODUCTION In machine learning, one very often knows more about the function to be learned than just the training data. An interesting case is when certain directional derivatives ofthe desired function are known at certain points.

neural network, tangent vector, transformation, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Multi-Digit Recognition Using a Space Displacement Neural Network

Matan, Ofer, Burges, Christopher J. C., LeCun, Yann, Denker, John S.

We present a feed-forward network architecture for recognizing an unconstrained handwritten multi-digit string. This is an extension of previous work on recognizing isolated digits. In this architecture a single digit recognizer is replicated over the input. The output layer of the network is coupled to a Viterbi alignment module that chooses the best interpretation of the input. Training errors are propagated through the Viterbi module.

artificial intelligence, neural network, recognition, (15 more...)

Country: North America > United States (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Tangent Prop - A formalism for specifying selected invariances in an adaptive network

Simard, Patrice, Victorri, Bernard, LeCun, Yann, Denker, John

In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired behavior of the system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spatial distortions of the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the derivative of its outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform. 1 INTRODUCTION In machine learning, one very often knows more about the function to be learned than just the training data. An interesting case is when certain directional derivatives of the desired function are known at certain points.

artificial intelligence, neural network, tangent vector, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsDec-31-1991

Second Order Properties of Error Surfaces: Learning Time and Generalization

LeCun, Yann, Kanter, Ido, Solla, Sara A.

The learning time of a simple neural network model is obtained through an analytic computation of the eigenvalue spectrum for the Hessian matrix, which describes the second order properties of the cost function in the space of coupling coefficients. The form of the eigenvalue distribution suggests new techniques for accelerating the learning process, and provides a theoretical justification for the choice of centered versus biased state variables.

artificial intelligence, eigenvalue, neural network, (16 more...)

Country: North America > United States (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)