AITopics

Inspired by the information theoretic idea of minimum description length, we add a term to the back propagation cost function that penalizes network complexity. We give the details of the procedure, called weight-elimination, describe its dynamics, and clarify the meaning of the parameters involved. From a Bayesian perspective, the complexity term can be usefully interpreted as an assumption about prior distribution of the weights. We use this procedure to predict the sunspot time series and the notoriously noisy series of currency exchange rates. 1 INTRODUCTION Learning procedures for connectionist networks are essentially statistical devices for performing inductive inference. There is a tradeoff between two goals: on the one hand, we want such devices to be as general as possible so that they are able to learn a broad range of problems.

complexity term, rumelhart, weight-elimination, (16 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Government (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Biswas, Sanjay, Venkatesh, Santosh S.

The Devil and the Network: What Sparsity Implies to Robustness and Memory

Robustness is a commonly bruited property of neural networks; in particular, a folk theorem in neural computation asserts that neural networks-in contexts with large interconnectivity-continue to function efficiently, albeit with some degradation, in the presence of component damage or loss. A second folk theorem in such contexts asserts that dense interconnectivity between neural elements is a sine qua non for the efficient usage of resources. These premises are formally examined in this communication in a setting that invokes the notion of the "devil"

multiset, outer-product algorithm, vector, (13 more...)

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > California > San Diego County > San Diego (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)

Closed-Form Inversion of Backpropagation Networks: Theory and Optimization Issues

Rossen, Michael L.

We describe a closed-form technique for mapping the output of a trained backpropagation network int.o input activity space. The mapping is an verse in mapping in the sense that, when the image of the mapping in input activity space is propagat.ed

inverse mapping image, mapping, nullspace, (14 more...)

Country: North America > United States > California > San Diego County > San Diego (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.64)

Denker, John S., LeCun, Yann

Transforming Neural-Net Output Levels to Probability Distributions

John S. Denker and Yann leCun AT&T Bell Laboratories Holmdel, NJ 07733 Abstract (1) The outputs of a typical multi-output classification network do not satisfy the axioms of probability; probabilities should be positive and sum to one. This problem can be solved by treating the trained network as a preprocessor that produces a feature vector that can be further processed, for instance by classical statistical estimation techniques. It is particularly useful to combine these two ideas: we implement the ideas of section 1 using Parzen windows, where the shape and relative size of each window is computed using the ideas of section 2. This allows us to make contact between important theoretical ideas (e.g. the ensemble formalism) and practical techniques (e.g. Our results also shed new light on and generalize the well-known "soft max" scheme. For example, in speech recognition, these numbers represent the probability of C different phonemes; the probabilities of successive segments can be combined using a Hidden Markov Model.

probability, training data, transforming neural-net output level, (11 more...)

Country: North America > United States > District of Columbia > Washington (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Bilbro, Griff L., Bout, David E. van den

Learning Theory and Experiments with Competitive Networks

Raleigh, NC 27695-7914 Abstract We apply the theory of Tishby, Levin, and Sol1a (TLS) to two problems. First we analyze an elementary problem for which we find the predictions consistent with conventional statistical results. Second we numerically examine the more realistic problem of training a competitive net to learn a probability density from samples. We find TLS useful for predicting average training behavior.. 1 TLS APPLIED TO LEARNING DENSITIES Recently a theory of learning has been constructed which describes the learning of a relation from examples (Tishby, Levin, and Sol1a, 1989), (Schwarb, Samalan, Sol1a, and Denker, 1990). The original derivation relies on a statistical mechanics treatment of the probability of independent events in a system with a specified average value of an additive error function. The resulting theory is not restricted to learning relations and it is not essentially statistical mechanical.

equation 7, learning theory and experiment, target error, (13 more...)

Country:

North America > United States > North Carolina > Wake County > Raleigh (0.26)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.43)

Darken, Christian, Moody, John E.

Note on Learning Rate Schedules for Stochastic Optimization

We present and compare learning rate schedules for stochastic gradient descent, a general algorithm which includes LMS, online backpropagation and k-means clustering as special cases. We introduce "search-thenconverge" type schedules which outperform the classical constant and "running average" (1ft) schedules both in speed of convergence and quality of solution.

algorithm, exemplar, learning rate schedule, (10 more...)

Country:

North America > United States > California (0.14)
North America > United States > Connecticut > New Haven County > New Haven (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.38)

Gupta, Ajay, Maass, Wolfgang

A Method for the Efficient Design of Boltzmann Machines for Classiffication Problems

A Boltzmann machine ([AHS], [HS], [AK]) is a neural network model in which the units update their states according to a stochastic decision rule. It consists of a set U of units, a set C of unordered pairs of elements of U, and an assignment of connection strengths S: C -- R. A configuration of a Boltzmann machine is a map k: U -- {O, I}.

boltzmann machine, node, state change, (13 more...)

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > New York (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

On Stochastic Complexity and Admissible Models for Neural Network Classifiers

Smyth, Padhraic

For a detailed rationale the reader is referred to the work of Rissanen (1984) or Wallace and Freeman (1987) and the references therein. Note that the Minimum Description Length (MDL) technique (as Rissanen's approach has become known) is implicitly related to Maximum A Posteriori (MAP) Bayesian estimation techniques if cast in the appropriate framework.

admissible model, classification problem, description length, (13 more...)

Country:

North America > United States > New York (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Designing Linear Threshold Based Neural Network Pattern Classifiers

Fine, Terrence L.

The three problems that concern us are identifying a natural domain of pattern classification applications of feed forward neural networks, selecting an appropriate feedforward network architecture, and assessing the tradeoff between network complexity, training set size, and statistical reliability as measured by the probability of incorrect classification. We close with some suggestions, for improving the bounds that come from Vapnik Chervonenkis theory, that can narrow, but not close, the chasm between theory and practice. Neural networks are appropriate as pattern classifiers when the pattern sources are ones of which we have little understanding, beyond perhaps a nonparametric statistical model, but we have been provided with classified samples of features drawn from each of the pattern categories. Neural networks should be able to provide rapid and reliable computation of complex decision functions. The issue in doubt is their statistical response to new inputs.

designing linear threshold, node, probability, (11 more...)

Country: North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Keesing, Ron, Stork, David G.

Evolution and Learning in Neural Networks: The Number and Distribution of Learning Trials Affect the Rate of Evolution

Learning can increase the rate of evolution of a population of biological organisms (the Baldwin effect). Our simulations show that in a population of artificial neural networks solving a pattern recognition problem, no learning or too much learning leads to slow evolution of the genes whereas an intermediate amount is optimal. Moreover, for a given total number of training presentations, fastest evoution occurs if different individuals within each generation receive different numbers of presentations, rather than equal numbers. Because genetic algorithms (GAs) help avoid local minima in energy functions, our hybrid learning-GA systems can be applied successfully to complex, highdimensional pattern recognition problems. INTRODUCTION The structure and function of a biological network derives from both its evolutionary precursors and real-time learning.

evolution, fitness, learning, (14 more...)

Country:

North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)