AITopics

Online learning is one of the most common forms of neural network training. We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.

equation, infinite training, order parameter, (14 more...)

Country: Europe > United Kingdom (0.04)

Genre: Instructional Material > Online (0.50)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.94)

Smola, Alex J., Schölkopf, Bernhard

From Regularization Operators to Support Vector Kernels

Support Vector (SV) Machines for pattern recognition, regression estimation and operator inversion exploit the idea of transforming into a high dimensional feature space where they perform a linear algorithm. Instead of evaluating this map explicitly, one uses Hilbert Schmidt Kernels k(x, y) which correspond to dot products of the mapped data in high dimensional space, i.e. k(x, y) ( I (x) · I (y))

kernel, regularization operator, sv machine, (9 more...)

Country:

North America > United States > New York (0.05)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Germany > Berlin (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.74)

Rattray, Magnus, Saad, David

Globally Optimal On-line Learning Rules

We present a method for determining the globally optimal online learning rule for a soft committee machine under a statistical mechanics framework. This work complements previous results on locally optimal rules, where only the rate of change in generalization error was considered. We maximize the total reduction in generalization error over the whole learning process and show how the resulting rule can significantly outperform the locally optimal rule. 1 Introduction We consider a learning scenario in which a feed-forward neural network model (the student) emulates an unknown mapping (the teacher), given a set of training examples produced by the teacher. The performance of the student network is typically measured by its generalization error, which is the expected error on an unseen example. The aim of training is to reduce the generalization error by adapting the student network's parameters appropriately. A common form of training is online learning, where training patterns are presented sequentially and independently to the network at each learning step.

algorithm, generalization error, optimal rule, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Middle East > Jordan (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Leen, Todd K., Schottky, Bernhard, Saad, David

Two Approaches to Optimal Annealing

We employ both master equation and order parameter approaches to analyze the asymptotic dynamics of online learning with different learning rate annealing schedules. We examine the relations between the results obtained by the two approaches and obtain new results on the optimal decay coefficients and their dependence on the number of hidden nodes in a two layer architecture.

equation, generalization error, order parameter approach, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Oregon (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
(2 more...)

Industry: Education > Educational Setting (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Asymptotic Theory for Regularization: One-Dimensional Linear Case

Koistinen, Petri

The generalization ability of a neural network can sometimes be improved dramatically by regularization. To analyze the improvement one needs more refined results than the asymptotic distribution of the weight vector. Here we study the simple case of one-dimensional linear regression under quadratic regularization, i.e., ridge regression. We study the random design, misspecified case, where we derive expansions for the optimal regularization parameter and the ensuing improvement. It is possible to construct examples where it is best to use no regularization.

asymptotic theory, expansion, regularization parameter, (13 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Europe > Austria > Vienna (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.35)

Kivinen, Jyrki, Warmuth, Manfred K. K.

Relative Loss Bounds for Multidimensional Regression Problems

We study online generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the softmax function that need to consider the linear activations to all the output neurons. We use distance functions of a certain kind in two completely independent roles in deriving and analyzing online learning algorithms for such tasks. We use one distance function to define a matching loss function for the (possibly multidimensional) transfer function, which allows us to generalize earlier results from one-dimensional to multidimensional outputs. We use another distance function as a tool for measuring progress made by the online updates. This shows how previously studied algorithms such as gradient descent and exponentiated gradient fit into a common framework. We evaluate the performance of the algorithms using relative loss bounds that compare the loss of the online algoritm to the best off-line predictor from the relevant model class, thus completely eliminating probabilistic assumptions about the data.

algorithm, loss function, transfer function, (13 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit

Hyvärinen, Aapo

We derive a first-order approximation of the density of maximum entropy for a continuous 1-D random variable, given a number of simple constraints. This results in a density expansion which is somewhat similar to the classical polynomial density expansions by Gram-Charlier and Edgeworth. Using this approximation of density, an approximation of 1-D differential entropy is derived. The approximation of entropy is both more exact and more robust against outliers than the classical approximation based on the polynomial density expansions, without being computationally more expensive. The approximation has applications, for example, in independent component analysis and projection pursuit. 1 Introduction The basic information-theoretic quantity for continuous one-dimensional random variables is differential entropy. The differential entropy H of a scalar random variable X with density f(x) is defined as H(X) - / f(x) log f(x)dx.

approximation, entropy, projection pursuit, (15 more...)

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Selecting Weighting Factors in Logarithmic Opinion Pools

Heskes, Tom

A simple linear averaging of the outputs of several networks as e.g. in bagging [3], seems to follow naturally from a bias/variance decomposition of the sum-squared error. The sum-squared error of the average model is a quadratic function of the weighting factors assigned to the networks in the ensemble [7], suggesting a quadratic programming algorithm for finding the "optimal" weighting factors. If we interpret the output of a network as a probability statement, the sum-squared error corresponds to minus the loglikelihood or the Kullback-Leibler divergence, and linear averaging of the outputs to logarithmic averaging of the probability statements: the logarithmic opinion pool. The crux of this paper is that this whole story about model averaging, bias/variance decompositions, and quadratic programming to find the optimal weighting factors, is not specific for the sumsquared error, but applies to the combination of probability statements of any kind in a logarithmic opinion pool, as long as the Kullback-Leibler divergence plays the role of the error measure. As examples we treat model averaging for classification models under a cross-entropy error measure and models for estimating variances.

kullback-leibler divergence, logarithmic opinion pool, weighting factor, (11 more...)

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.05)
Asia > Middle East > Jordan (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Vinje, William E., Gallant, Jack L.

Modeling Complex Cells in an Awake Macaque during Natural Image Viewing

Our model consists of a classical energy mechanism whose output is divided by nonclassical gain control and texture contrast mechanisms. We apply this model to review movies, a stimulus sequence that replicates the stimulation a cell receives during free viewing of natural images. Data were collected from three cells using five different review movies, and the model was fit separately to the data from each movie. For the energy mechanism alone we find modest but significant correlations (rE 0.41, 0.43, 0.59, 0.35) between model and data. These correlations are improved somewhat when we allow for suppressive surround effects (rE G 0.42, 0.56, 0.60, 0.37). In one case the inclusion of a delayed suppressive surround dramatically improves the fit to the data by modifying the time course of the model's response.

energy mechanism, mechanism, review movie, (11 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.15)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence (0.48)
Information Technology > Sensing and Signal Processing > Image Processing (0.35)

Vigário, Ricardo, Jousmäki, Veikko, Hämäläinen, Matti, Hari, Riitta, Oja, Erkki

Independent Component Analysis for Identification of Artifacts in Magnetoencephalographic Recordings

We have studied the application of an independent component analysis (ICA) approach to the identification and possible removal of artifacts from a magnetoencephalographic (MEG) recording.

artifact, component analysis, independent component, (14 more...)

Country: Europe > Finland > Uusimaa > Helsinki (0.05)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)