AITopics

ASakurai@jaist.ac.jp Abstract O(ws(s log d log(dqh/ s))) and O(ws((h/ s) log q) log(dqh/s)) are upper bounds for the VC-dimension of a set of neural networks of units with piecewise polynomial activation functions, where s is the depth of the network, h is the number of hidden units, w is the number of adjustable parameters, q is the maximum of the number of polynomial segments of the activation function, and d is the maximum degree of the polynomials; also n(wslog(dqh/s)) is a lower bound for the VC-dimension of such a network set, which are tight for the cases s 8(h) and s is constant. For the special case q 1, the VC-dimension is 8(ws log d). 1 Introduction In spite of its importance, we had been unable to obtain VC-dimension values for practical types of networks, until fairly tight upper and lower bounds were obtained ([6], [8], [9], and [10]) for linear threshold element networks in which all elements perform a threshold function on weighted sum of inputs. This is mainly because the differentiability ofthe functions is needed to perform backpropagation or other learning algorithms. Unfortunately explicit bounds obtained so far for the VC-dimension of sigmoidal networks exhibit large gaps (O(w2h2) ([3]), n(w log h) for bounded depth 324 A.Sakurai and f!(wh) for unbounded depth) and are hard to improve. For the piecewise linear case, Maass obtained a result that the VO-dimension is O(w210g q), where q is the number of linear pieces of the function ([5]).

activation function, artificial intelligence, machine learning, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Rae, H. C., Sollich, Peter, Coolen, Anthony C. C.

On-Line Learning with Restricted Training Sets: Exact Solution as Benchmark for General Theories

Calculation of Q(t) and R(t) using (4, 5, 7, 9) to execute the path average and the average over sets is relatively straightforward, albeit tedious. We find that -"Yt(l -"Yt)

activation function, artificial intelligence, machine learning, (16 more...)

Country: Asia (0.14)

Genre: Instructional Material > Online (0.50)

Industry: Education > Educational Setting > Online (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Opper, Manfred, Winther, Ole

Mean Field Methods for Classification with Gaussian Processes

We discuss the application of TAP mean field methods known from the Statistical Mechanics of disordered systems to Bayesian classification modelswith Gaussian processes. In contrast to previous approaches, noknowledge about the distribution of inputs is needed. Simulation results for the Sonar data set are given. They have been recently introduced into the Neural Computation community (Neal 1996, Williams & Rasmussen 1996, Mackay 1997). If we assume fields with zero prior mean, the statistics of h is entirely defined by the second order correlations C(s, S') E[h(s)h(S')], where E denotes expectations 310 MOpper and 0. Winther with respect to the prior. Interesting examples are C(s, s') (1) C(s, s') (2) The choice (1) can be motivated as a limit of a two-layered neural network with infinitely many hidden units with factorizable input-hidden weight priors (Williams 1997).

artificial intelligence, classification, machine learning, (17 more...)

Country:

Europe > Denmark (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom (0.14)
Europe > Sweden (0.14)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Opper, Manfred, Vivarelli, Francesco

General Bounds on Bayes Errors for Regression with Gaussian Processes

Based on a simple convexity lemma, we develop bounds for different typesof Bayesian prediction errors for regression with Gaussian processes. The basic bounds are formulated for a fixed training set. Simpler expressions are obtained for sampling from an input distribution whichequals the weight function of the covariance kernel, yielding asymptotically tight results. The results are compared with numerical experiments.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

Country:

Europe > United Kingdom (0.28)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Modeling & Simulation (0.95)

Meir, Ron, Maiorov, Vitaly

On the Optimality of Incremental Neural Network Algorithms

We study the approximation of functions by two-layer feedforward neural networks,focusing on incremental algorithms which greedily add units, estimating single unit parameters at each stage. As opposed to standard algorithms for fixed architectures, the optimization at each stage is performed over a small number of parameters, mitigating many of the difficult numerical problems inherent in high-dimensional nonlinear optimization. Weestablish upper bounds on the error incurred by the algorithm, when approximating functions from the Sobolev class, thereby extending previous results which only provided rates of convergence for functions in certain convex hulls of functional spaces. By comparing our results to recently derived lower bounds, we show that the greedy algorithms arenearly optimal. Combined with estimation error results for greedy algorithms, a strong case can be made for this type of approach.

approximation, artificial intelligence, machine learning, (17 more...)

Country: Asia > Middle East > Israel (0.15)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Mason, Llew, Bartlett, Peter L., Baxter, Jonathan

Direct Optimization of Margins Improves Generalization in Combined Classifiers

The dark curve is AdaBoost, the light curve is DOOM. DOOM sacrifices significant training error forimproved test error (horizontal markson margin 0 line)_ 1 Introduction Many learning algorithms for pattern classification minimize some cost function of the training data, with the aim of minimizing error (the probability of misclassifying an example). One example of such a cost function is simply the classifier's error on the training data.

artificial intelligence, cost function, machine learning, (17 more...)

Country: North America > United States > California (0.14)

Industry: Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Maass, Wolfgang, Sontag, Eduardo D.

A Precise Characterization of the Class of Languages Recognized by Neural Nets under Gaussian and Other Common Noise Distributions

We consider recurrent analog neural nets where each gate is subject to Gaussian noise, or any other common noise distribution whose probability densityfunction is nonzero on a large set.

analog neural, artificial intelligence, machine learning, (17 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Karakoulas, Grigoris I., Shawe-Taylor, John

Optimizing Classifers for Imbalanced Training Sets

Following recent results [9, 8] showing the importance of the fatshattering dimensionin explaining the beneficial effect of a large margin on generalization performance, the current paper investigates theimplications of these results for the case of imbalanced datasets and develops two approaches to setting the threshold. The approaches are incorporated into ThetaBoost, a boosting algorithm fordealing with unequal loss functions. The performance of ThetaBoost and the two approaches are tested experimentally.

artificial intelligence, machine learning, threshold, (18 more...)

Country: North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.31)

Herschkowitz, Didier, Nadal, Jean-Pierre

Unsupervised and Supervised Clustering: The Mutual Information between Parameters and Observations

Recent works in parameter estimation and neural coding have demonstrated that optimal performance are related to the mutual information between parameters and data. We consider the mutual information in the case where the dependency in the parameter (a vector 8) of the conditional p.d.f. of each observation (a vector

artificial intelligence, machine learning, mutual information, (13 more...)

Country:

Asia (0.16)
North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Gentile, Claudio, Warmuth, Manfred K.

Linear Hinge Loss and Average Margin

We describe a unifying method for proving relative loss bounds for online linearthreshold classification algorithms, such as the Perceptron and the Winnow algorithms. For classification problems the discrete loss is used, i.e., the total number of prediction mistakes. We introduce a continuous lossfunction, called the "linear hinge loss", that can be employed to derive the updates of the algorithms. We first prove bounds w.r.t. the linear hinge loss and then convert them to the discrete loss. We introduce anotion of "average margin" of a set of examples . We show how relative loss bounds based on the linear hinge loss can be converted to relative loss bounds i.t.o. the discrete loss using the average margin.

algorithm, artificial intelligence, machine learning, (15 more...)

Country:

Europe (0.28)
North America > United States > California (0.14)

Industry: Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)