AITopics

This paper is concerned with the problem of learning in networks where some or all of the functions involved are not smooth. Examples of such networks are those whose neural transfer functions are piecewise-linear and those whose error function is defined in terms of the 100 norm. Up to now, networks whose neural transfer functions are piecewise-linear have received very little consideration in the literature, but the possibility of using an error function defined in terms of the 100 norm has received some attention. In this paper we draw upon some recent results from the field of nonsmooth optimization (NSO) to present an algorithm for the non smooth case. Our motivation for this work arose out of the fact that we have been able to show that, in backpropagation, an error function based upon the 100 norm overcomes the difficulties which can occur when using the 12 norm. 1 INTRODUCTION This paper is concerned with the problem of learning in networks where some or all of the functions involved are not smooth.

approximation, error function, function and subgradient evaluation, (13 more...)

Country:

North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > Queensland (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Oceania > Australia > South Australia > Adelaide (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Krogh, Anders, Hertz, John A.

A Simple Weight Decay Can Improve Generalization

It has been observed in numerical simulations that a weight decay can improve generalization in a feed-forward neural network.

generalization error, vector, weight decay, (12 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.05)
North America > United States > New York (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Bertoni, Alberto, Campadelli, Paola, Morpurgo, Anna, Panizza, Sandra

Polynomial Uniform Convergence of Relative Frequencies to Probabilities

We define the concept of polynomial uniform convergence of relative frequencies to probabilities in the distribution-dependent context.

probability, relative frequency, uniform convergence, (10 more...)

Country: Europe > Italy > Lombardy > Milan (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Simard, Patrice, Victorri, Bernard, LeCun, Yann, Denker, John

Tangent Prop - A formalism for specifying selected invariances in an adaptive network

In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired behavior of the system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spatial distortions of the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the derivative of its outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform. 1 INTRODUCTION In machine learning, one very often knows more about the function to be learned than just the training data. An interesting case is when certain directional derivatives of the desired function are known at certain points.

invariance, tangent vector, transformation, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Haussler, David, Kearns, Michael, Opper, Manfred, Schapire, Robert

Estimating Average-Case Learning Curves Using Bayesian, Statistical Physics and VC Dimension Methods

In this paper we investigate an average-case model of concept learning, and give results that place the popular statistical physics and VC dimension theories of learning curve behavior in a common framework.

algorithm, gibbs algorithm, probability, (13 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > New Jersey (0.05)
North America > United States > New York (0.04)
Europe > Germany (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems

Moody, John E.

We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learning systems, such as multilayer perceptrons and radial basis functions.

akaike, effective number, peff, (13 more...)

Country:

North America > United States > New York (0.05)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Bayesian Model Comparison and Backprop Nets

MacKay, David J. C.

The Bayesian model comparison framework is reviewed, and the Bayesian Occam's razor is explained. This framework can be applied to feedforward networks, making possible (1) objective comparisons between solutions using alternative network architectures; (2) objective choice of magnitude and type of weight decay terms; (3) quantified estimates of the error bars on network parameters and on network output. The framework also generates a measure of the effective number of parameters determined by the data. The relationship of Bayesian model comparison to recent work on prediction of generalisation ability (Guyon et al., 1992, Moody, 1992) is discussed.

error bar, inference, occam factor, (12 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Principles of Risk Minimization for Learning Theory

Vapnik, V.

Learning is posed as a problem of function estimation, for which two principles of solution are considered: empirical risk minimization and structural risk minimization. These two principles are applied to two different statements of the function estimation problem: global and local. Systematic improvements in prediction power are illustrated in application to zip-code recognition.

algorithm, minimization, risk minimization, (13 more...)

Country:

North America > United States > New York (0.04)
North America > United States > California (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Constrained Optimization Applied to the Parameter Setting Problem for Analog Circuits

Kirk, David, Fleischer, Kurt, Watts, Lloyd, Barr, Alan

We use constrained optimization to select operating parameters for two circuits: a simple 3-transistor square root circuit, and an analog VLSI artificial cochlea. This automated method uses computer controlled measurement and test equipment to choose chip parameters which minimize the difference between the actual circuit's behavior and a specified goal behavior. Choosing the proper circuit parameters is important to compensate for manufacturing deviations or adjust circuit performance within a certain range. As biologically-motivated analog VLSI circuits become increasingly complex, implying more parameters, setting these parameters by hand will become more cumbersome. Thus an automated parameter setting method can be of great value [Fleischer 90].

constrained optimization applied, error metric, optimization, (13 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Semiconductors & Electronics (0.57)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Platt, John C., Faggin, Federico

Networks for the Separation of Sources that are Superimposed and Delayed

We have created new networks to unmix signals which have been mixed either with time delays or via filtering. We first show that a subset of the Herault-Jutten learning rules fulfills a principle of minimum output power. We then apply this principle to extensions of the Herault-Jutten network which have delays in the feedback path. Our networks perform well on real speech and music signals that have been mixed using time delays or filtering.

herault-jutten network, pure signal, separation, (13 more...)

Country:

North America > United States > Utah (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Colorado (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)