AITopics

Country: North America > United States (0.69)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.65)

Sollich, Peter, Krogh, Anders

Learning with ensembles: How overfitting can be useful

We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data. 1 INTRODUCTION An ensemble is a collection of a (finite) number of neural networks or other types of predictors that are trained for the same task.

artificial intelligence, neural network, student, (19 more...)

Country: Europe > Denmark (0.14)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

West, Ansgar H. L., Saad, David

Adaptive Back-Propagation in On-Line Learning of Multilayer Networks

This research has been motivated by the dominance of the suboptimal symmetric phase in online learning of two-layer feedforward networks trained by gradient descent [2]. This trapping is emphasized for inappropriate small learning rates but exists in all training scenarios, effecting the learning process considerably. We Adaptive Back-Propagation in Online Learning of Multilayer Networks 329 proposed an adaptive back-propagation training algorithm [Eq.

computer based training, educational technology, gradient descent, (19 more...)

Genre: Instructional Material > Online (0.40)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Is Learning The n-th Thing Any Easier Than Learning The First?

Thrun, Sebastian

This paper investigates learning in a lifelong context. Lifelong learning addresses situations in which a learner faces a whole stream of learning tasks.Such scenarios provide the opportunity to transfer knowledge across multiple learning tasks, in order to generalize more accurately from less training data. In this paper, several different approaches to lifelong learning are described, and applied in an object recognition domain. It is shown that across the board, lifelong learning approaches generalize consistently more accurately from less training data, by their ability to transfer knowledge across learning tasks. 1 Introduction Supervised learning is concerned with approximating an unknown function based on examples. Virtuallyall current approaches to supervised learning assume that one is given a set of input-output examples, denoted by X, which characterize an unknown function, denoted by f.

inductive learning, knowledge, survey article, (17 more...)

Country: North America > United States > California (0.15)

Genre:

Overview (0.74)
Research Report (0.54)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.91)

Sollich, Peter, Krogh, Anders

Learning with ensembles: How overfitting can be useful

AndersKrogh'" NORDITA, Blegdamsvej 17 2100 Copenhagen, Denmark kroghGsanger.ac.uk Abstract We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fitthe training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. Forsmaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance,in particular if the individual students are subject to noise in the training process.

artificial intelligence, machine learning, student, (18 more...)

Country: Europe > Denmark > Capital Region > Copenhagen (0.24)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

West, Ansgar H. L., Saad, David

Adaptive Back-Propagation in On-Line Learning of Multilayer Networks

This research has been motivated by the dominance of the suboptimal symmetric phase in online learning of two-layer feedforward networks trained by gradient descent [2]. This trapping is emphasized for inappropriate small learning rates but exists in all training scenarios, effecting the learning process considerably. We Adaptive Back-Propagation in Online Learning of Multilayer Networks 329 proposed an adaptive back-propagation training algorithm [Eq.

computer based training, educational technology, gradient descent, (19 more...)

Genre: Instructional Material > Online (0.40)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Saad, David, Solla, Sara A.

Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks

Sollat CONNECT, The Niels Bohr Institute Blegdamsdvej 17 Copenhagen 2100, Denmark Abstract We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented andused to investigate the role of the learning rate in controlling theevolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.

artificial intelligence, generalization error, neural network, (16 more...)

Country: Europe > Denmark > Capital Region > Copenhagen (0.24)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Saad, David, Solla, Sara A.

Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks

We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.

artificial intelligence, generalization error, neural network, (16 more...)

Country: Europe > Denmark (0.14)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Is Learning The n-th Thing Any Easier Than Learning The First?

Thrun, Sebastian

This paper investigates learning in a lifelong context. Lifelong learning addresses situations in which a learner faces a whole stream of learning tasks. Such scenarios provide the opportunity to transfer knowledge across multiple learning tasks, in order to generalize more accurately from less training data. In this paper, several different approaches to lifelong learning are described, and applied in an object recognition domain. It is shown that across the board, lifelong learning approaches generalize consistently more accurately from less training data, by their ability to transfer knowledge across learning tasks.

knowledge, neural network, survey article, (18 more...)

Country: North America > United States > California (0.14)

Genre:

Overview (0.74)
Research Report (0.54)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Schraudolph, Nicol N., Sejnowski, Terrence J.

Tempering Backpropagation Networks: Not All Weights are Created Equal

Backpropagation learning algorithms typically collapse the network's structure into a single vector of weight parameters to be optimized. We suggest that their performance may be improved by utilizing the structural information instead of discarding it, and introduce a framework for ''tempering'' each weight accordingly. In the tempering model, activation and error signals are treated as approximately independent random variables. The characteristic scale of weight changes is then matched to that ofthe residuals, allowing structural properties such as a node's fan-in and fan-out to affect the local learning rate and backpropagated error. The model also permits calculation of an upper bound on the global learning rate for batch updates, which in turn leads to different update rules for bias vs. non-bias weights. This approach yields hitherto unparalleled performance on the family relations benchmark, a deep multi-layer network: for both batch learning with momentum and the delta-bar-delta algorithm, convergence at the optimal learning rate is sped up by more than an order of magnitude.

learning rate, neural network, survey article, (16 more...)

Country: North America > United States (0.69)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.65)