AITopics

Using a statistical mechanical formalism we calculate the evidence, generalisation error and consistency measure for a linear perceptron trained and tested on a set of examples generated by a non linear teacher. The teacher is said to be unrealisable because the student can never model it without error. Our model allows us to interpolate between the known case of a linear teacher, and an unrealisable, nonlinear teacher. A comparison of the hyperparameters which maximise the evidence with those that optimise the performance measures reveals that, in the nonlinear case, the evidence procedure is a misleading guide to optimising performance. Finally, we explore the extent to which the evidence procedure is unreliable and find that, despite being sub-optimal, in some circumstances it might be a useful method for fixing the hyperparameters. 1 INTRODUCTION The analysis of supervised learning or learning from examples is a major field of research within neural networks.

evidence procedure, generalisation error, performance measure, (13 more...)

Country:

Europe > United Kingdom (0.14)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Deco, Gustavo, Brauer, Wilfried

Higher Order Statistical Decorrelation without Information Loss

A neural network learning paradigm based on information theory is proposed as a way to perform in an unsupervised fashion, redundancy reduction among the elements of the output layer without loss of information from the sensory input. The model developed performs nonlinear decorrelation up to higher orders of the cumulant tensors and results in probabilistic ally independent components of the output layer. This means that we don't need to assume Gaussian distribution neither at the input nor at the output. The theory presented is related to the unsupervised-learning theory of Barlow, which proposes redundancy reduction as the goal of cognition. When nonlinear units are used nonlinear principal component analysis is obtained.

architecture, information, transformation, (10 more...)

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > New York (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Krogh, Anders, Vedelsby, Jesper

Neural Network Ensembles, Cross Validation, and Active Learning

It is well known that a combination of many different predictors can improve predictions. In the neural networks community "ensembles" of neural networks has been investigated by several authors, see for instance [1, 2, 3]. Most often the networks in the ensemble are trained individually and then their predictions are combined. This combination is usually done by majority (in classification) or by simple averaging (in regression), but one can also use a weighted combination of the networks.

ambiguity, ensemble, generalization error, (11 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.14)
Europe > Denmark > Capital Region > Kongens Lyngby (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.45)

From Data Distributions to Regularization in Invariant Learning

Leen, Todd K.

Ideally pattern recognition machines provide constant output when the inputs are transformed under a group 9 of desired invariances. These invariances can be achieved by enhancing the training data to include examples of inputs transformed by elements of g, while leaving the corresponding targets unchanged. Alternatively the cost function for training can include a regularization term that penalizes changes in the output when the input is transformed under the group. This paper relates the two approaches, showing precisely the sense in which the regularized cost function approximates the result of adding transformed (or distorted) examples to the training data. The cost function for the enhanced training set is equivalent to the sum of the original cost function plus a regularizer. For unbiased models, the regularizer reduces to the intuitively obvious choice - a term that penalizes changes in the output when the inputs are transformed under the group. For infinitesimal transformations, the coefficient of the regularization term reduces to the variance of the distortions introduced into the training data. This correspondence provides a simple bridge between the two approaches.

cost function, regularizer, transformation, (14 more...)

Country: North America > United States > Oregon > Washington County > Beaverton (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.49)

Kowalczyk, Adam, Ferrá, Herman L.

Generalisation in Feedforward Networks

They provide in particular some theoretical bounds on the sample complexity, i.e. a minimal number of training samples assuring the desired accuracy with the desired confidence. However there are a few obvious deficiencies in these results: (i) the sample complexity bounds are unrealistically high (c.f. Section 4.), and (ii) for some networks they do not hold at all since VC-dimension is infinite, e.g.

prh, sample complexity, vc-dimension, (14 more...)

Country: Oceania > Australia (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Learning in large linear perceptrons and why the thermodynamic limit is relevant to the real world

Sollich, Peter

We first rederive the known results for the'thermodynamic limit' of infinite perceptron size N and show explicitly that 9

correction, generalization error, thermodynamic limit, (13 more...)

Country:

North America > United States > New York (0.05)
North America > United States > Indiana > Grant County > Marion (0.04)
Europe > United Kingdom (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.64)

Wang, Deliang, Terman, David

Synchrony and Desynchrony in Neural Oscillator Networks

An novel class of locally excitatory, globally inhibitory oscillator networks is proposed.

oscillator, scene segmentation, synchronization, (13 more...)

Country:

North America > United States > Ohio > Franklin County > Columbus (0.05)
Europe > Germany > Lower Saxony > Gottingen (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

On the Computational Complexity of Networks of Spiking Neurons

Maass, Wolfgang

We investigate the computational power of a formal model for networks of spiking neurons, both for the assumption of an unlimited timing precision, and for the case of a limited timing precision. We also prove upper and lower bounds for the number of examples that are needed to train such networks.

computational power, neural net, neuron, (13 more...)

Country:

Europe > Austria > Styria > Graz (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.99)

Zemel, Richard S., Sejnowski, Terrence J.

Grouping Components of Three-Dimensional Moving Objects in Area MST of Visual Cortex

Previous investigators have suggested that these cells may represent self-motion. Spiral patterns can also be generated by the relative motion of the observer and a particular object. An MST cell may then account for some portion of the complex flow field, and the set of active cells could encode the entire flow; in this manner, MST effectively segments moving objects. Such a grouping operation is essential in interpreting scenes containing several independent moving objects and observer motion. We describe a model based on the hypothesis that the selective tuning of MST cells reflects the grouping of object components undergoing coherent motion. Inputs to the model were generated from sequences of ray-traced images that simulated realistic motion situations, combining observer motion, eye movements, and independent object motion. The input representation was modeled after response properties of neurons in area MT, which provides the primary input to area MST. After applying an unsupervised learning algorithm, the units became tuned to patterns signaling coherent motion. The results match many of the known properties of MST cells and are consistent with recent studies indicating that these cells process 3-D object motion information.

flow field, grouping component, three-dimensional, (16 more...)