AITopics

A new algorithm for Support Vector regression is described.

fraction, regression, vapnik, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Tight Bounds for the VC-Dimension of Piecewise Polynomial Networks

Sakurai, Akito

O(ws(s log d log(dqh/ s))) and O(ws((h/ s) log q) log(dqh/ s)) are upper bounds for the VC-dimension of a set of neural networks of units with piecewise polynomial activation functions, where s is the depth of the network, h is the number of hidden units, w is the number of adjustable parameters, q is the maximum of the number of polynomial segments of the activation function, and d is the maximum degree of the polynomials; also n(wslog(dqh/s)) is a lower bound for the VC-dimension of such a network set, which are tight for the cases s 8(h) and s is constant. For the special case q 1, the VC-dimension is 8(ws log d). 1 Introduction In spite of its importance, we had been unable to obtain VC-dimension values for practical types of networks, until fairly tight upper and lower bounds were obtained ([6], [8], [9], and [10]) for linear threshold element networks in which all elements perform a threshold function on weighted sum of inputs. This is mainly because the differentiability of the functions is needed to perform backpropagation or other learning algorithms. Unfortunately explicit bounds obtained so far for the VC-dimension of sigmoidal networks exhibit large gaps (O(w2h2) ([3]), n(w log h) for bounded depth 324 A. Sakurai and f!(wh) for unbounded depth) and are hard to improve. For the piecewise linear case, Maass obtained a result that the VO-dimension is O(w210g q), where q is the number of linear pieces of the function ([5]).

activation function, polynomial, vc-dimension, (12 more...)

Country: Asia > Japan (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Rae, H. C., Sollich, Peter, Coolen, Anthony C. C.

On-Line Learning with Restricted Training Sets: Exact Solution as Benchmark for General Theories

Calculation of Q(t) and R(t) using (4, 5, 7, 9) to execute the path average and the average over sets is relatively straightforward, albeit tedious. We find that -"Yt(l -"Yt)

activation function, polynomial, vc-dimension, (14 more...)

Country:

Asia > Japan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Instructional Material > Online (0.50)

Industry: Education > Educational Setting > Online (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Opper, Manfred, Winther, Ole

Mean Field Methods for Classification with Gaussian Processes

We discuss the application of TAP mean field methods known from the Statistical Mechanics of disordered systems to Bayesian classification models with Gaussian processes. In contrast to previous approaches, no knowledge about the distribution of inputs is needed. Simulation results for the Sonar data set are given.

classification, gaussian process, mean field method, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.05)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Opper, Manfred, Vivarelli, Francesco

General Bounds on Bayes Errors for Regression with Gaussian Processes

Based on a simple convexity lemma, we develop bounds for different types of Bayesian prediction errors for regression with Gaussian processes. The basic bounds are formulated for a fixed training set. Simpler expressions are obtained for sampling from an input distribution which equals the weight function of the covariance kernel, yielding asymptotically tight results. The results are compared with numerical experiments.

gaussian process, general bound, regression, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Modeling & Simulation (0.95)

Meir, Ron, Maiorov, Vitaly

On the Optimality of Incremental Neural Network Algorithms

We study the approximation of functions by two-layer feedforward neural networks, focusing on incremental algorithms which greedily add units, estimating single unit parameters at each stage. As opposed to standard algorithms for fixed architectures, the optimization at each stage is performed over a small number of parameters, mitigating many of the difficult numerical problems inherent in high-dimensional nonlinear optimization. We establish upper bounds on the error incurred by the algorithm, when approximating functions from the Sobolev class, thereby extending previous results which only provided rates of convergence for functions in certain convex hulls of functional spaces. By comparing our results to recently derived lower bounds, we show that the greedy algorithms are nearly optimal. Combined with estimation error results for greedy algorithms, a strong case can be made for this type of approach.

algorithm, approximation, neural network, (14 more...)

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.05)
North America > United States (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Li, Zhaoping, Dayan, Peter

Computational Differences between Asymmetrical and Symmetrical Networks

However, because of the separation between excitation and inhibition, biological neural networks are asymmetrical. We study characteristic differences between asymmetrical networks and their symmetrical counterparts, showing that they have dramatically different dynamical behavior and also how the differences can be exploited for computational ends. We illustrate our results in the case of a network that is a selective amplifier.

computational difference, inhibitory cell, symmetry, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > China > Hong Kong (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Kearns, Michael J., Saul, Lawrence K.

Inference in Multilayer Networks via Large Deviation Bounds

Arguably one of the most important types of information processing is the capacity for probabilistic reasoning. The properties of undirectedproDabilistic models represented as symmetric networks have been studied extensively using methods from statistical mechanics (Hertz et aI, 1991). Detailed analyses of these models are possible by exploiting averaging phenomena that occur in the thermodynamic limit of large networks. In this paper, we analyze the limit of large, multilayer networks for probabilistic models represented as directed acyclic graphs. These models are known as Bayesian networks (Pearl, 1988; Neal, 1992), and they have different probabilistic semantics than symmetric neural networks (such as Hopfield models or Boltzmann machines). We show that the intractability of exact inference in multilayer Bayesian networks Inference in Multilayer Networks via Large Deviation Bounds 261 does not preclude their effective use. Our work builds on earlier studies of variational methods (Jordan et aI, 1997).

marginal probability, node, probability, (14 more...)

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Mateo County > Redwood City (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Ikeda, Shiro, Amari, Shun-ichi, Nakahara, Hiroyuki

Convergence of the Wake-Sleep Algorithm

The W-S (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. But even for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding of the W-S algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the W-S algorithm for the factor analysis model. We also show the condition for the convergence in general models.

algorithm, factor analysis model, generative model, (14 more...)

Country: Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Herschkowitz, Didier, Nadal, Jean-Pierre

Unsupervised and Supervised Clustering: The Mutual Information between Parameters and Observations

Recent works in parameter estimation and neural coding have demonstrated that optimal performance are related to the mutual information between parameters and data. We consider the mutual information in the case where the dependency in the parameter (a vector 8) of the conditional p.d.f. of each observation (a vector

calculation, estimator, mutual information, (11 more...)

Country:

Asia > Brunei (0.06)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)