AITopics

A set of labeled training samples is provided, and a network must be obtained which is then expected to correctly classify previously unseen inputs. In this context, a central problem is to estimate the amount of training data needed to guarantee satisfactory learning performance. To study this question, it is necessary to first formalize the notion of learning from examples. One such formalization is based on the paradigm of probably approximately correct (PAC) learning, due to Valiant (1984). In this framework, one starts by fitting some function /, chosen from a predetermined class F, to the given training data. The class F is often called the "hypothesis class", and for purposes of this discussion it will be assumed that the functions in F take binary values {O, I} and are defined on a common domain X.

architecture, dimension, vc dimension, (16 more...)

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.90)

Sollich, Peter, Krogh, Anders

Learning with ensembles: How overfitting can be useful

We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data. 1 INTRODUCTION An ensemble is a collection of a (finite) number of neural networks or other types of predictors that are trained for the same task.

ensemble, generalization error, student, (17 more...)

Country: Europe > Denmark > Capital Region > Copenhagen (0.04)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

Kearns, Michael J.

We work in a setting in which we must choose the right number of parameters for a hypothesis function in response to a finite training sample, with the goal of minimizing the resulting generalization error. There is a large and interesting literature on cross validation methods, which often emphasizes asymptotic statistical properties, or the exact calculation of the generalization error for simple models. Our approach here is somewhat different, and is pri mari I y inspired by two sources. The first is the work of Barron and Cover [2], who introduced the idea of bounding the error of a model selection method (in their case, the Minimum Description Length Principle) in terms of a quantity known as the index of resolvability. The second is the work of Vapnik [5], who provided extremely powerful and general tools for uniformly bounding the deviations between training and generalization errors. We combine these methods to give a new and general analysis of cross validation performance. In the first and more formal part of the paper, we give a rigorous bound on the error of cross validation in terms of two parameters of the underlying model selection problem: the approximation rate and the estimation rate. In the second and more experimental part of the paper, we investigate the implications of our bound for choosing'Y, the fraction of data withheld for testing in cross validation. The most interesting aspect of this analysis is the identification of several qualitative properties of the optimal'Y that appear to be invariant over a wide class of model selection problems: - When the target function complexity is small compared to the sample size, the performance of cross validation is relatively insensitive to the choice of'Y.

cross validation, generalization error, target function, (13 more...)

Country: North America > United States > New York (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Amari, Shun-ichi, Murata, Noboru, Müller, Klaus-Robert, Finke, Michael, Yang, Howard Hua

Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

early stopping, generalization error, stopping, (17 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.65)

Learning Model Bias

Baxter, Jonathan

In this paper the problem of learning appropriate domain-specific bias is addressed. It is shown that this can be achieved by learning many related tasks from the same domain, and a theorem is given bounding the number tasks that must be learnt.

learner, learnt, representation, (13 more...)

Country:

Oceania > Australia > South Australia (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Plasticity of Center-Surround Opponent Receptive Fields in Real and Artificial Neural Systems of Vision

Yasui, S., Furukawa, T., Yamada, M., Saito, T.

The center-surround opponent receptive field (CSRF) mechanism represents one such example. Here, analogous CSRFs are shown to be formed in an artificial neural network which learns to localize contours (edges) of the luminance difference. Furthermore, when the input pattern is corrupted by a background noise, the CSRFs of the hidden units becomes shallower and broader with decrease of the signal-to-noise ratio (SNR). The same kind of SNR-dependent plasticity is present in the CSRF of real visual neurons; in bipolar cells of the carp retina as is shown here experimentally, as well as in large monopolar cells of the fly compound eye as was described by others. Also, analogous SNRdependent plasticity is shown to be present in the biphasic flash responses (BPFR) of these artificial and biological visual systems. Thus, the spatial (CSRF) and temporal (BPFR) filtering properties with which a wide variety of creatures see the world appear to be optimized for detectability of changes in space and time. 1 INTRODUCTION A number of learning algorithms have been developed to make synthetic neural machines be trainable to function in certain optimal ways. If the brain and nervous systems that we see in nature are best answers of the evolutionary process, then one might be able to find some common'softwares' in real and artificial neural systems. This possibility is examined in this paper, with respect to a basic visual 160 S. Y ASUI, T. FURUKAWA, M. YAMADA, T. SAITO

background noise, csrf, noise, (13 more...)

Country:

Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.05)
North America > United States > New York (0.04)
North America > United States > District of Columbia > Washington (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Simulation of a Thalamocortical Circuit for Computing Directional Heading in the Rat

Blair, Hugh T.

Several regions of the rat brain contain neurons known as head-direction celis, which encode the animal's directional heading during spatial navigation. This paper presents a biophysical model of head-direction cell acti vity, which suggests that a thalamocortical circuit might compute the rat's head direction by integrating the angular velocity of the head over time. The model was implemented using the neural simulator NEURON, and makes testable predictions about the structure and function of the rat head-direction circuit.

ahd cell, head direction, head-direction cell, (13 more...)

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.05)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.47)

Makeig, Scott, Bell, Anthony J., Jung, Tzyy-Ping, Sejnowski, Terrence J.

Independent Component Analysis of Electroencephalographic Data

Because of the distance between the skull and brain and their different resistivities, electroencephalographic (EEG) data collected from any point on the human scalp includes activity generated within a large brain area. This spatial smearing of EEG data by volume conduction does not involve significant time delays, however, suggesting that the Independent Component Analysis (ICA) algorithm of Bell and Sejnowski [1] is suitable for performing blind source separation on EEG data.

algorithm, correlation, independent component analysis, (15 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > Nevada > Clark County > Las Vegas (0.04)

Industry:

Health & Medicine (0.70)
Government > Regional Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Hasselmo, Michael E., Cekic, Milos

Cholinergic suppression of transmission may allow combined associative memory function and self-organization in the neocortex

Selective suppression of transmission at feedback synapses during learning is proposed as a mechanism for combining associative feedback with self-organization of feed forward synapses. Experimental data demonstrates cholinergic suppression of synaptic transmission in layer I (feedback synapses), and a lack of suppression in layer IV (feedforward synapses). A network with this feature uses local rules to learn mappings which are not linearly separable. During learning, sensory stimuli and desired response are simultaneously presented as input. Feedforward connections form self-organized representations of input, while suppressed feedback connections learn the transpose of feedforward connectivity. During recall, suppression is removed, sensory input activates the self-organized representation, and activity generates the learned response.

suppression, synaptic transmission, transmission, (14 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.44)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.44)

Kempter, Richard, Gerstner, Wulfram, Hemmen, J. Leo van, Wagner, Hermann

Temporal coding in the sub-millisecond range: Model of barn owl auditory pathway

Binaural coincidence detection is essential for the localization of external sounds and requires auditory signal processing with high temporal precision. We present an integrate-and-fire model of spike processing in the auditory pathway of the barn owl. It is shown that a temporal precision in the microsecond range can be achieved with neuronal time constants which are at least one magnitude longer. An important feature of our model is an unsupervised Hebbian learning rule which leads to a temporal fine tuning of the neuronal connections.

neuron, precision, spike, (13 more...)

Country: Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)