AITopics

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Vision (0.67)

Finnoff, W., Hergert, F., Zimmermann, H. G.

Extended Regularization Methods for Nonconvergent Model Selection

Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).

penalty term, test variable, training process, (14 more...)

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Assessing and Improving Neural Network Predictions by the Bootstrap Algorithm

Paass, Gerhard

The bootstrap method offers an computation intensive alternative to estimate the predictive distribution for a neural network even if the analytic derivation is intractable. The available asymptotic results show that it is valid for a large number of linear, nonlinear and even nonparametric regression problems. It has the potential to model the distribution of estimators to a higher precision than the usual normal asymptotics. It even may be valid if the normal asymptotics fail. However, the theoretical properties of bootstrap procedures for neural networks - especially nonlinear models - have to be investigated more comprehensively.

bootstrap, neural network prediction, predictive distribution, (11 more...)

Country:

Europe > Germany (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Chang, Eric I., Lippmann, Richard P.

A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively

A new boundary hunting radial basis function (BH-RBF) classifier which allocates RBF centers constructively near class boundaries is described. This classifier creates complex decision boundaries only in regions where confusions occur and corresponding RBF outputs are similar. A predicted square error measure is used to determine how many centers to add and to determine when to stop adding centers. Two experiments are presented which demonstrate the advantages of the BH RBF classifier. One uses artificial data with two classes and two input features where each class contains four clusters but only one cluster is near a decision region boundary.

boundary, classifier, rbf classifier, (10 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Bonnlander, Brian V., Mozer, Michael C.

Metamorphosis Networks: An Alternative to Constructive Models

Given a set oft raining examples, determining the appropriate number of free parameters is a challenging problem. Constructive learning algorithms attempt to solve this problem automatically by adding hidden units, and therefore free parameters, during learning. We explore an alternative class of algorithms-called metamorphosis algorithms-in which the number of units is fixed, but the number of free parameters gradually increases during learning. The architecture we investigate is composed of RBF units on a lattice, which imposes flexible constraints on the parameters of the network. Virtues of this approach include variable subset selection, robust parameter selection, multiresolution processing, and interpolation of sparse training data.

algorithm, free parameter, rbf unit, (17 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Simard, Patrice, LeCun, Yann, Denker, John S.

Efficient Pattern Recognition Using a New Transformation Distance

Memory-based classification algorithms such as radial basis functions or K-nearest neighbors typically rely on simple distances (Euclidean, dot product...), which are not particularly meaningful on pattern vectors. More complex, better suited distance measures are often expensive and rather ad-hoc (elastic matching, deformable templates). We propose a new distance measure which (a) can be made locally invariant to any set of transformations of the input and (b) can be computed efficiently. We tested the method on large handwritten character databases provided by the Post Office and the NIST. Using invariances with respect to translation, rotation, scaling, shearing and line thickness, the method consistently outperformed all other systems tested on the same databases.

prototype, tangent distance, transformation, (13 more...)

Country:

North America > United States > Colorado > Denver County > Denver (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Netherlands > South Holland > The Hague (0.04)

Industry:

Government > Post Office (0.66)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.89)

Holographic Recurrent Networks

Plate, Tony A.

Holographic Recurrent Networks (HRNs) are recurrent networks which incorporate associative memory techniques for storing sequential structure. HRNs can be easily and quickly trained using gradient descent techniques to generate sequences of discrete outputs and trajectories through continuous spaee. The performance of HRNs is found to be superior to that of ordinary recurrent networks on these sequence generation tasks.

representation, sequence, vector, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Chang, Eric I., Lippmann, Richard P.

A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively

A new boundary hunting radial basis function (BH-RBF) classifier which allocates RBF centers constructively near class boundaries is described. This classifier creates complex decision boundaries only in regions where confusions occur and corresponding RBF outputs are similar. A predicted square error measure is used to determine how many centers to add and to determine when to stop adding centers. Two experiments are presented which demonstrate the advantages of the BH RBF classifier. One uses artificial data with two classes and two input features where each class contains four clusters but only one cluster is near a decision region boundary.

boundary, classifier, rbf classifier, (10 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

A Parallel Gradient Descent Method for Learning in Analog VLSI Neural Networks

Alspector, J., Meir, R., Yuhas, B., Jayakumar, A., Lippe, D.

Typical methods for gradient descent in neural network learning involve calculation of derivatives based on a detailed knowledge of the network model. This requires extensive, time consuming calculations for each pattern presentation and high precision that makes it difficult to implement in VLSI. We present here a perturbation technique that measures, not calculates, the gradient. Since the technique uses the actual network as a measuring device, errors in modeling neuron activation and synaptic weights do not cause errors in gradient descent. The method is parallel in nature and easy to implement in VLSI. We describe the theory of such an algorithm, an analysis of its domain of applicability, some simulations using it and an outline of a hardware implementation.

learning, parallel gradient descent method, perturbation, (12 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
(2 more...)

Industry: Semiconductors & Electronics (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)

Kirk, David B., Kerns, Douglas, Fleischer, Kurt, Barr, Alan H.

Analog VLSI Implementation of Multi-dimensional Gradient Descent

The implementation uses noise injection and multiplicative correlation to estimate derivatives, as in [Anderson, Kerns 92]. One intended application of this technique is setting circuit parameters on-chip automatically, rather than manually [Kirk 91]. Gradient descent optimization may be used to adjust synapse weights for a backpropagation or other on-chip learning implementation. The approach combines the features of continuous multidimensional gradient descent and the potential for an annealing style of optimization. We present data measured from our analog VLSI implementation. 1 Introduction This work is similar to [Anderson, Kerns 92], but represents two advances. First, we describe the extension of the technique to multiple dimensions. Second, we demonstrate an implementation of the multidimensional technique in analog VLSI, and provide results measured from the chip. Unlike previous work using noise sources in adaptive systems, we use the noise as a means of estimating the gradient of a function f(y), rather than performing an annealing process [Alspector 88]. We also estimate gr-;:dients continuously in position and time, in contrast to [Umminger 89] and [J abri 91], which utilize discrete position gradient estimates.

analog vlsi implementation, gradient descent, gradient estimate, (11 more...)