AITopics

We study a particular type of Boltzmann machine with a bipartite graph structure called a harmonium. Our interest is in using such a machine to model a probability distribution on binary input vectors. We analyze the class of probability distributions that can be modeled by such machines.

boltzmann machine, harmonium model, vector, (15 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.37)

Bertoni, Alberto, Campadelli, Paola, Morpurgo, Anna, Panizza, Sandra

Polynomial Uniform Convergence of Relative Frequencies to Probabilities

We define the concept of polynomial uniform convergence of relative frequencies to probabilities in the distribution-dependent context.

probability, relative frequency, uniform convergence, (10 more...)

Country: Europe > Italy > Lombardy > Milan (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Simard, Patrice, Victorri, Bernard, LeCun, Yann, Denker, John

Tangent Prop - A formalism for specifying selected invariances in an adaptive network

In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired behavior of the system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spatial distortions of the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the derivative of its outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform. 1 INTRODUCTION In machine learning, one very often knows more about the function to be learned than just the training data. An interesting case is when certain directional derivatives of the desired function are known at certain points.

invariance, tangent vector, transformation, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Gradient Descent: Second Order Momentum and Saturating Error

Pearlmutter, Barak

We then regard gradient descent with momentum as a dynamic system and explore a non quadratic error surface, showing that saturation of the error accounts for a variety of effects observed in simulations and justifies some popular heuristics. 1 INTRODUCTION Gradient descent is the bread-and-butter optimization technique in neural networks. Some people build special purpose hardware to accelerate gradient descent optimization of backpropagation networks. Understanding the dynamics of gradient descent on such surfaces is therefore of great practical value. Here we briefly review the known results in the convergence of batch gradient descent; show that second-order momentum does not give any speedup; simulate a real network and observe some effect not predicted by theory; and account for these effects by analyzing gradient descent with momentum on a saturating error surface.

convergence, gradient descent, momentum, (11 more...)

Country: North America > United States > Connecticut > New Haven County > New Haven (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Threshold Network Learning in the Presence of Equivalences

Shawe-Taylor, John

This paper applies the theory of Probably Approximately Correct (PAC) learning to multiple output feedforward threshold networks in which the weights conform to certain equivalences. It is shown that the sample size for reliable learning can be bounded above by a formula similar to that required for single output networks with no equivalences. The best previously obtained bounds are improved for all cases.

equivalence network, node, probability, (12 more...)

Country: North America > United States > New York > Monroe County > Rochester (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Alspector, Joshua, Jayakumar, Anthony, Luna, Stephan

Experimental Evaluation of Learning in a Neural Microsystem

Joshua Alspector Anthony Jayakumar Stephan Lunat Bellcore Morristown, NJ 07962-1910 Abstract We report learning measurements from a system composed of a cascadable learning chip, data generators and analyzers for training pattern presentation, and an X-windows based software interface. The 32 neuron learning chip has 496 adaptive synapses and can perform Boltzmann and mean-field learning using separate noise and gain controls.

alspector, learning, neuron, (11 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Constant-Time Loading of Shallow 1-Dimensional Networks

Judd, Stephen

The complexity of learning in shallow I-Dimensional neural networks has been shown elsewhere to be linear in the size of the network. However, when the network has a huge number of units (as cortex has) even linear time might be unacceptable. Furthermore, the algorithm that was given to achieve this time was based on a single serial processor and was biologically implausible. In this work we consider the more natural parallel model of processing and demonstrate an expected-time complexity that is constant (i.e.

assignment, constant-time loading, node function, (13 more...)

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Haussler, David, Kearns, Michael, Opper, Manfred, Schapire, Robert

Estimating Average-Case Learning Curves Using Bayesian, Statistical Physics and VC Dimension Methods

In this paper we investigate an average-case model of concept learning, and give results that place the popular statistical physics and VC dimension theories of learning curve behavior in a common framework.

algorithm, gibbs algorithm, probability, (13 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > New Jersey (0.05)
North America > United States > New York (0.04)
Europe > Germany (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems

Moody, John E.

We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learning systems, such as multilayer perceptrons and radial basis functions.

akaike, effective number, peff, (13 more...)

Country:

North America > United States > New York (0.05)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Bayesian Model Comparison and Backprop Nets

MacKay, David J. C.

The Bayesian model comparison framework is reviewed, and the Bayesian Occam's razor is explained. This framework can be applied to feedforward networks, making possible (1) objective comparisons between solutions using alternative network architectures; (2) objective choice of magnitude and type of weight decay terms; (3) quantified estimates of the error bars on network parameters and on network output. The framework also generates a measure of the effective number of parameters determined by the data. The relationship of Bayesian model comparison to recent work on prediction of generalisation ability (Guyon et al., 1992, Moody, 1992) is discussed.

error bar, inference, occam factor, (12 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)