AITopics

Although one can derive the Gaussian noise assumption based on a maximum entropy approach, the main reason for this assumption is practicability: underthe Gaussian noise assumption the maximum likelihood parameter estimate can simply be found by minimization of the squared error. Despite its common use it is far from clear that the Gaussian noise assumption is a good choice for many practical problems. Areasonable approach therefore would be a noise distribution which contains the Gaussian as a special case but which has a tunable parameter that allows for more flexible distributions.

artificial intelligence, bayesian inference, machine learning, (14 more...)

Country: Europe > Germany (0.15)

Industry: Education > Educational Setting > Online (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Andrieu, Christophe, Freitas, João F. G. de, Doucet, Arnaud

Robust Full Bayesian Methods for Neural Networks

In particular, Mackay showed that by approximating the distributions of the weights with Gaussians and adopting smoothing priors, it is possible to obtain estimates of the weights and output variances and to automatically set the regularisation coefficients.Neal (1996) cast the net much further by introducing advanced Bayesian simulation methods, specifically the hybrid Monte Carlo method, into the analysis of neural networks [3]. Bayesian sequential Monte Carlo methods have also been shown to provide good training results, especially in time-varying scenarios [4]. More recently, Rios Insua and Muller (1998) and Holmes and Mallick (1998) have addressed the issue of selecting the number of hidden neurons with growing and pruning algorithms from a Bayesian perspective [5,6]. In particular, they apply the reversible jump Markov Chain Monte Carlo (MCMC) algorithm of Green [7] to feed-forward sigmoidal networks and radial basis function (RBF) networks to obtain joint estimates of the number of neurons and weights. We also apply the reversible jump MCMC simulation algorithm to RBF networks so as to compute the joint posterior distribution of the radial basis parameters and the number of basis functions. However, we advance this area of research in two important directions.Firstly, we propose a full hierarchical prior for RBF networks.

artificial intelligence, bayesian inference, machine learning, (14 more...)

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Some Theoretical Results Concerning the Convergence of Compositions of Regularized Linear Functions

Zhang, Tong

Recently, sample complexity bounds have been derived for problems involving linearfunctions such as neural networks and support vector machines. In this paper, we extend some theoretical results in this area by deriving dimensional independent covering number bounds for regularized linearfunctions under certain regularization conditions. We show that such bounds lead to a class of new methods for training linear classifiers withsimilar theoretical advantages of the support vector machine. Furthermore, we also present a theoretical analysis for these new methods fromthe asymptotic statistical point of view. This technique provides better description for large sample behaviors of these algorithms. 1 Introduction In this paper, we are interested in the generalization performance of linear classifiers obtained fromcertain algorithms.

artificial intelligence, machine learning, theorem 3, (17 more...)

Country: North America > United States (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.96)

Zhang, Liqing, Amari, Shun-ichi, Cichocki, Andrzej

Semiparametric Approach to Multichannel Blind Deconvolution of Nonminimum Phase Systems

In this paper we discuss the semiparametric statistical model for blind deconvolution. First we introduce a Lie Group to the manifold of noncausal FIRfilters. Then blind deconvolution problem is formulated in the framework of a semiparametric model, and a family of estimating functions is derived for blind deconvolution. A natural gradient learning algorithmis developed for training noncausal filters. Stability of the natural gradient algorithm is also analyzed in this framework.

artificial intelligence, deconvolution, machine learning, (16 more...)

Country:

Europe > France (0.14)
Oceania > Australia (0.14)
North America > United States > Wisconsin (0.14)
Asia > Japan (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Algebraic Analysis for Non-regular Learning Machines

Watanabe, Sumio

Hierarchical learning machines are non-regular and non-identifiable statistical models, whose true parameter sets are analytic sets with singularities. Using algebraic analysis, we rigorously prove that the stochastic complexity of a non-identifiable learning machine is asymptotically equal to '1 log n - (ml - 1) log log n

algebraic analysis, artificial intelligence, machine learning, (16 more...)

Country: Asia > Japan > Honshū > Kantō (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Probabilistic Methods for Support Vector Machines

Sollich, Peter

One of the open questions that remains is how to set the'tunable' parameters of an SVM algorithm: While methods forchoosing the width of the kernel function and the noise parameter C (which controls how closely the training data are fitted) have been proposed [4, 5] (see also, very recently, [6]), the effect of the overall shape of the kernel function remains imperfectly understood [1]. Error bars (class probabilities) for SVM predictions - important for safety-critical applications, for example - are also difficult to obtain. In this paper I suggest that a probabilistic interpretation of SVMs could be used to tackle these problems. It shows that the SVM kernel defines a prior over functions on the input space, avoiding the need to think in terms of high-dimensional feature spaces. It also allows one to define quantities such as the evidence (likelihood) for a set of hyperparameters (C, kernel amplitude Ko etc). I give a simple approximation to the evidence which can then be maximized to set such hyperparameters. The evidence is sensitive to the values of C and Ko individually, in contrast to properties (such as cross-validation error) of the deterministic solution, which only depends on the product CKo. It can thfrefore be used to assign an unambiguous value to C, from which error bars can be derived.

artificial intelligence, machine learning, support vector machine, (16 more...)

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Smola, Alex J., Shawe-Taylor, John, Schölkopf, Bernhard, Williamson, Robert C.

The Entropy Regularization Information Criterion

Effective methods of capacity control via uniform convergence bounds for function expansions have been largely limited to Support Vector machines, wheregood bounds are obtainable by the entropy number approach.

artificial intelligence, entropy number, machine learning, (16 more...)

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.58)

Siegelmann, Hava T., Roitershtein, Alexander, Ben-Hur, Asa

Noisy Neural Networks and Generalizations

In this paper we define a probabilistic computational model which generalizes many noisy neural network models, including the recent work of Maass and Sontag [5]. We identify weak ergodicjty as the mechanism responsible for restriction of the computational power of probabilistic models to definite languages, independent of the characteristics of the noise: whether it is discrete or analog, or if it depends on the input or not, and independent of whether the variables are discrete or continuous. We give examples of weakly ergodic models including noisy computational systems with noise depending on the current state and inputs, aggregate models, and computational systems which update in continuous time. 1 Introduction Noisy neural networks were recently examined, e.g.

artificial intelligence, machine learning, weakly ergodic, (17 more...)

Country: Asia > Middle East > Israel (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Lower Bounds on the Complexity of Approximating Continuous Functions by Sigmoidal Neural Networks

Schmitt, Michael

This is one of the theoretical results most frequently cited to justify the use of sigmoidal neural networks in applications. By this statement one refers to the fact that sigmoidal neural networks have been shown to be able to approximate any continuous function arbitrarily well. Numerous results in the literature have established variants of this universal approximation property by considering distinct function classes to be approximated by network architectures using different types of neural activation functions with respect to various approximation criteria, see for instance [1, 2, 3, 5, 6, 11, 12, 14, 15].

artificial intelligence, machine learning, polynomial, (18 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Risau-Gusman, Sebastian, Gordon, Mirta B.

Understanding Stepwise Generalization of Support Vector Machines: a Toy Model

In this article we study the effects of introducing structure in the input distribution of the data to be learnt by a simple perceptron. We determine the learning curves within the framework of Statistical Mechanics.Stepwise generalization occurs as a function of the number of examples when the distribution of patterns is highly anisotropic. Although extremely simple, the model seems to capture therelevant features of a class of Support Vector Machines which was recently shown to present this behavior.

artificial intelligence, machine learning, subspace, (17 more...)

Country: Europe > France (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)