AITopics

We present a unified framework for a number of different ways of failing to generalize properly.

chaitin-kolmogorov complexity and generalization, ensemble, training data, (11 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > California > San Mateo County > San Mateo (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.42)

LeCun, Yann, Kanter, Ido, Solla, Sara A.

Second Order Properties of Error Surfaces: Learning Time and Generalization

The learning time of a simple neural network model is obtained through an analytic computation of the eigenvalue spectrum for the Hessian matrix, which describes the second order properties of the cost function in the space of coupling coefficients. The form of the eigenvalue distribution suggests new techniques for accelerating the learning process, and provides a theoretical justification for the choice of centered versus biased state variables.

eigenvalue, matrix, second order property, (13 more...)

Country:

North America > United States > New Jersey (0.04)
North America > United States > California (0.04)
Asia > Middle East > Israel (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Cohn, David, Tesauro, Gerald

Can neural networks do better than the Vapnik-Chervonenkis bounds?

These experiments are designed to test whether average generalization performance can surpass the worst-case bounds obtained from formal learning theory using the Vapnik-Chervonenkis dimension (Blumer et al., 1989). We indeed find that, in some cases, the average generalization is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 11m result of the bound. In other cases, we do find the 11m behavior of the VC bound, and in these cases, the numerical prefactor is closely related to prefactor contained in the bound.

artificial data, experiment, generalization, (16 more...)

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Baum, Eric B., Lang, Kevin J.

Constructing Hidden Units using Examples and Queries

While the network loading problem for 2-layer threshold nets is NPhard when learning from examples alone (as with backpropagation), (Baum, 91) has now proved that a learner can employ queries to evade the hidden unit credit assignment problem and PACload nets with up to four hidden units in polynomial time. Empirical tests show that the method can also learn far more complicated functions such as randomly generated networks with 200 hidden units. The algorithm easily approximates Wieland's 2-spirals function using a single layer of 50 hidden units, and requires only 30 minutes of CPU time to learn 200-bit parity to 99.7% accuracy.

constructing hidden unit, plane, query, (15 more...)

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Krogh, Anders, Hertz, John A.

Dynamics of Generalization in Linear Perceptrons

We study the evolution of the generalization ability of a simple linear perceptron with N inputs which learns to imitate a "teacher perceptron". The system is trained on p aN binary example inputs and the generalization ability measured by testing for agreement with the teacher on all 2N possible binary input patterns. The dynamics may be solved analytically and exhibits a phase transition from imperfect to perfect generalization at a 1. Except at this point the generalization ability approaches its asymptotic value exponentially, with critical slowing down near the transition; the relaxation time is ex (1 - y'a)-2.

generalization, generalization ability, perfect generalization, (12 more...)

Country:

Europe > Denmark > Capital Region > Copenhagen (0.05)
Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.84)

Generalization Dynamics in LMS Trained Linear Networks

Chauvin, Yves

Recent progress in network design demonstrates that nonlinear feedforward neural networks can perform impressive pattern classification for a variety of real-world applications (e.g., Le Cun et al., 1990; Waibel et al., 1989). Various simulations and relationships between the neural network and machine learning theoretical literatures also suggest that too large a number of free parameters ("weight overfitting") could substantially reduce generalization performance.

error component, validation dynamic, validation error, (15 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Germany > Berlin (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Weigend, Andreas S., Rumelhart, David E., Huberman, Bernardo A.

Generalization by Weight-Elimination with Application to Forecasting

Inspired by the information theoretic idea of minimum description length, we add a term to the back propagation cost function that penalizes network complexity. We give the details of the procedure, called weight-elimination, describe its dynamics, and clarify the meaning of the parameters involved. From a Bayesian perspective, the complexity term can be usefully interpreted as an assumption about prior distribution of the weights. We use this procedure to predict the sunspot time series and the notoriously noisy series of currency exchange rates. 1 INTRODUCTION Learning procedures for connectionist networks are essentially statistical devices for performing inductive inference. There is a tradeoff between two goals: on the one hand, we want such devices to be as general as possible so that they are able to learn a broad range of problems.

complexity term, rumelhart, weight-elimination, (16 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Government (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Biswas, Sanjay, Venkatesh, Santosh S.

The Devil and the Network: What Sparsity Implies to Robustness and Memory

Robustness is a commonly bruited property of neural networks; in particular, a folk theorem in neural computation asserts that neural networks-in contexts with large interconnectivity-continue to function efficiently, albeit with some degradation, in the presence of component damage or loss. A second folk theorem in such contexts asserts that dense interconnectivity between neural elements is a sine qua non for the efficient usage of resources. These premises are formally examined in this communication in a setting that invokes the notion of the "devil"

multiset, outer-product algorithm, vector, (13 more...)

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > California > San Diego County > San Diego (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)

Closed-Form Inversion of Backpropagation Networks: Theory and Optimization Issues

Rossen, Michael L.

We describe a closed-form technique for mapping the output of a trained backpropagation network int.o input activity space. The mapping is an verse in mapping in the sense that, when the image of the mapping in input activity space is propagat.ed

inverse mapping image, mapping, nullspace, (14 more...)

Country: North America > United States > California > San Diego County > San Diego (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.64)

Denker, John S., LeCun, Yann

Transforming Neural-Net Output Levels to Probability Distributions

John S. Denker and Yann leCun AT&T Bell Laboratories Holmdel, NJ 07733 Abstract (1) The outputs of a typical multi-output classification network do not satisfy the axioms of probability; probabilities should be positive and sum to one. This problem can be solved by treating the trained network as a preprocessor that produces a feature vector that can be further processed, for instance by classical statistical estimation techniques. It is particularly useful to combine these two ideas: we implement the ideas of section 1 using Parzen windows, where the shape and relative size of each window is computed using the ideas of section 2. This allows us to make contact between important theoretical ideas (e.g. the ensemble formalism) and practical techniques (e.g. Our results also shed new light on and generalize the well-known "soft max" scheme. For example, in speech recognition, these numbers represent the probability of C different phonemes; the probabilities of successive segments can be combined using a Hidden Markov Model.

probability, training data, transforming neural-net output level, (11 more...)

Country: North America > United States > District of Columbia > Washington (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)