AITopics

ABSTRACT We present a Bayesian framework for inferring the parameters of a mixture of experts model based on ensemble learning by variational free energy minimisation. The Bayesian approach avoids the over-fitting and noise level underestimation problems of traditional maximum likelihood inference. We demonstrate these methods on artificial problems and sunspot time series prediction. INTRODUCTION The task of estimating the parameters of adaptive models such as artificial neural networks using Maximum Likelihood (ML) is well documented ego Geman, Bienenstock & Doursat (1992). ML estimates typically lead to models with high variance, a process known as "over-fitting".

algorithm, bayesian method, hyperparameter, (12 more...)

Country:

Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > Canada > Ontario > Toronto (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Semenov, Serguei A., Shuvalova, Irina B.

Some results on convergent unlearning algorithm

In the past years the unsupervised learning schemes arose strong interest among researchers but for the time being a little is known about underlying learning mechanisms, as well as still less rigorous results like convergence theorems were obtained in this field. One of promising concepts along this line is so called "unlearning" for the Hopfield-type neural networks (Hopfield et ai, 1983, van Hemmen & Klemmer, 1992, Wimbauer et ai, 1994). Elaborating that elegant ideas the convergent unlearning algorithm has recently been proposed (Plakhov & Semenov, 1994), executing without patterns presentation. It is aimed at to correct initial Hebbian connectivity in order to provide extensive storage of arbitrary correlated data. This algorithm is stated as follows. Pick up at iteration step m, m 0,1,2,... a random network state s(m)

convergence theorem, plakhov & semenov, probability, (12 more...)

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > France (0.04)
Asia > Russia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Kowalczyk, Adam, Szymanski, Jacek, Bartlett, Peter L., Williamson, Robert C.

Examples of learning curves from a modified VC-formalism

We examine the issue of evaluation of model specific parameters in a modified VC-formalism. Two examples are analyzed: the 2-dimensional homogeneous perceptron and the I-dimensional higher order neuron. Both models are solved theoretically, and their learning curves are compared against true learning curves. It is shown that the formalism has the potential to generate a variety of learning curves, including ones displaying ''phase transitions."

learning curve, phase transition, vc-formalism, (13 more...)

Country:

Oceania > Australia > South Australia (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.35)

West, Ansgar H. L., Saad, David

Adaptive Back-Propagation in On-Line Learning of Multilayer Networks

This research has been motivated by the dominance of the suboptimal symmetric phase in online learning of two-layer feedforward networks trained by gradient descent [2]. This trapping is emphasized for inappropriate small learning rates but exists in all training scenarios, effecting the learning process considerably. We Adaptive Back-Propagation in Online Learning of Multilayer Networks 329 proposed an adaptive back-propagation training algorithm [Eq.

algorithm, equation, gradient descent, (11 more...)

Country: Europe > United Kingdom (0.05)

Genre: Instructional Material > Online (0.40)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Auer, Peter, Herbster, Mark, Warmuth, Manfred K. K.

Exponentially many local minima for single neurons

We show that for a single neuron with the logistic function as the transfer function the number of local minima of the error function based on the square loss can grow exponentially in the dimension.

error function, local minima, minima, (15 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Austria (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Helmbold, David P., Kivinen, Jyrki, Warmuth, Manfred K. K.

Worst-case Loss Bounds for Single Neurons

We analyze and compare the well-known Gradient Descent algorithm and a new algorithm, called the Exponentiated Gradient algorithm, for training a single neuron with an arbitrary transfer function. Both algorithms are easily generalized to larger neural networks, and the generalization of Gradient Descent is the standard back-propagation algorithm. In this paper we prove worstcase loss bounds for both algorithms in the single neuron case. Since local minima make it difficult to prove worst-case bounds for gradient-based algorithms, we must use a loss function that prevents the formation of spurious local minima. We define such a matching loss function for any strictly increasing differentiable transfer function and prove worst-case loss bound for any such transfer function and its corresponding matching loss. For example, the matching loss for the identity function is the square loss and the matching loss for the logistic sigmoid is the entropic loss. The different structure of the bounds for the two algorithms indicates that the new algorithm outperforms Gradient Descent when the inputs contain a large number of irrelevant components.

algorithm, transfer function, weight vector, (15 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.15)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

Saad, David, Solla, Sara A.

Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks

We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.

equation, generalization error, vector, (13 more...)

Country:

North America > United States (0.04)
Europe > United Kingdom (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Marchand, Mario, Hadjifaradji, Saeed

Strong Unimodality and Exact Learning of Constant Depth µ-Perceptron Networks

strong unimodality and exact learning

Abstract Missing

Country:

Europe > United Kingdom (0.04)
Asia > Middle East > UAE (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.40)

Shawe-Taylor, John, Zhao, Jieyu

Generalisation of A Class of Continuous Neural Networks

More recently attempts have been made to introduce some computational cost related to the accuracy of the computations [5]. The model proposed in this paper weakens the computational power still further by relying on classical boolean circuits to perform the computation using a simple encoding of the real values. Using this encoding we also show that Teo circuits interpreted in the model correspond to a Neural Network design referred to as Bit Stream Neural Networks, which have been developed for hardware implementation [8]. With the perspective afforded by the general approach considered here, we are also able to analyse the Bit Stream Neural Networks (or indeed any other adaptive system based on the technique), giving VC dimension and sample size bounds for PAC learning.

bernoulli sequence, neural network, probability, (14 more...)

Country: Europe > Switzerland (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Coolen, A.C.C., Laughton, S. N., Sherrington, D.

Modern Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks

We describe the use of modern analytical techniques in solving the dynamics of symmetric and nonsymmetric recurrent neural networks near saturation. These explicitly take into account the correlations between the post-synaptic potentials, and thereby allow for a reliable prediction of transients. 1 INTRODUCTION Recurrent neural networks have been rather popular in the physics community, because they lend themselves so naturally to analysis with tools from equilibrium statistical mechanics. This was the main theme of physicists between, say, 1985 and 1990. Less familiar to the neural network community is a subsequent wave of theoretical physical studies, dealing with the dynamics of symmetric and nonsymmetric recurrent networks. The strategy here is to try to describe the processes at a reduced level of an appropriate small set of dynamic macroscopic observables.

analytic technique, sherrington, simulation, (14 more...)

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)