AITopics

We introduce a semi-supervised support vector machine (S3yM) method. Given a training set of labeled data and a working set of unlabeled data, S3YM constructs a support vector machine using both the training and working sets. We use S3 YM to solve the transduction problem using overall risk minimization (ORM) posed by Yapnik. The transduction problem is to estimate the value of a classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification function at all possible values and then using the fixed function to deduce the classes of the working set data.

generalization, support vector machine, vector machine, (14 more...)

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.05)
North America > United States > New York > Rensselaer County > Troy (0.05)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Learning a Hierarchical Belief Network of Independent Factor Analyzers

Attias, Hagai

The model parameters are learned in an unsupervised manner by maximizing the likelihood that these data are generated by the model. A multilayer belief network is a realization of such a model. Many belief networks have been proposed that are composed of binary units. The hidden units in such networks represent latent variables that explain different features of the data, and whose relation to the ·Current address: Gatsby Computational Neuroscience Unit, University College London, 17 Queen Square, London WC1N 3AR, U.K. 362 H. Attias data is highly nonlinear. However, for tasks such as object and speech recognition which produce real-valued data, the models provided by binary networks are often inadequate.

algorithm, approximation, latent variable, (16 more...)

Country:

Europe > United Kingdom (0.24)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

A Theory of Mean Field Approximation

Tanaka, Toshiyuki

I present a theory of mean field approximation based on information geometry. This theory includes in a consistent way the naive mean field approximation, as well as the TAP approach and the linear response theorem in statistical physics, giving clear information-theoretic interpretations to them. 1 INTRODUCTION Many problems of neural networks, such as learning and pattern recognition, can be cast into a framework of statistical estimation problem. How difficult it is to solve a particular problem depends on a statistical model one employs in solving the problem. For Boltzmann machines[ 1] for example, it is computationally very hard to evaluate expectations of state variables from the model parameters. Mean field approximation[2], which is originated in statistical physics, has been frequently used in practical situations in order to circumvent this difficulty.

approximation, field approximation, mean field approximation, (13 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.54)

Learning Curves for Gaussian Processes

Sollich, Peter

I consider the problem of calculating learning curves (i.e., average generalization performance) of Gaussian processes used for regression. A simple expression for the generalization error in terms of the eigenvalue decomposition of the covariance function is derived, and used as the starting point for several approximation schemes. I identify where these become exact, and compare with existing bounds on learning curves; the new approximations, which can be used for any input space dimension, generally get substantially closer to the truth. 1 INTRODUCTION: GAUSSIAN PROCESSES Within the neural networks community, there has in the last few years been a good deal of excitement about the use of Gaussian processes as an alternative to feedforward networks [lJ. The advantages of Gaussian processes are that prior assumptions about the problem to be learned are encoded in a very transparent way, and that inference-at least in the case of regression that I will consider-is relatively straightforward. One crucial question for applications is then how'fast' Gaussian processes learn, i.e., how many training examples are needed to achieve a certain level of generalization performance.

approximation, covariance function, gaussian process, (14 more...)

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.56)

Schölkopf, Bernhard, Bartlett, Peter L., Smola, Alex J., Williamson, Robert C.

Shrinking the Tube: A New Support Vector Regression Algorithm

A new algorithm for Support Vector regression is described.

fraction, regression, vapnik, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Tight Bounds for the VC-Dimension of Piecewise Polynomial Networks

Sakurai, Akito

O(ws(s log d log(dqh/ s))) and O(ws((h/ s) log q) log(dqh/ s)) are upper bounds for the VC-dimension of a set of neural networks of units with piecewise polynomial activation functions, where s is the depth of the network, h is the number of hidden units, w is the number of adjustable parameters, q is the maximum of the number of polynomial segments of the activation function, and d is the maximum degree of the polynomials; also n(wslog(dqh/s)) is a lower bound for the VC-dimension of such a network set, which are tight for the cases s 8(h) and s is constant. For the special case q 1, the VC-dimension is 8(ws log d). 1 Introduction In spite of its importance, we had been unable to obtain VC-dimension values for practical types of networks, until fairly tight upper and lower bounds were obtained ([6], [8], [9], and [10]) for linear threshold element networks in which all elements perform a threshold function on weighted sum of inputs. This is mainly because the differentiability of the functions is needed to perform backpropagation or other learning algorithms. Unfortunately explicit bounds obtained so far for the VC-dimension of sigmoidal networks exhibit large gaps (O(w2h2) ([3]), n(w log h) for bounded depth 324 A. Sakurai and f!(wh) for unbounded depth) and are hard to improve. For the piecewise linear case, Maass obtained a result that the VO-dimension is O(w210g q), where q is the number of linear pieces of the function ([5]).

activation function, polynomial, vc-dimension, (12 more...)

Country: Asia > Japan (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Rae, H. C., Sollich, Peter, Coolen, Anthony C. C.

On-Line Learning with Restricted Training Sets: Exact Solution as Benchmark for General Theories

Calculation of Q(t) and R(t) using (4, 5, 7, 9) to execute the path average and the average over sets is relatively straightforward, albeit tedious. We find that -"Yt(l -"Yt)

activation function, polynomial, vc-dimension, (14 more...)

Country:

Asia > Japan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Instructional Material > Online (0.50)

Industry: Education > Educational Setting > Online (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Opper, Manfred, Winther, Ole

Mean Field Methods for Classification with Gaussian Processes

We discuss the application of TAP mean field methods known from the Statistical Mechanics of disordered systems to Bayesian classification models with Gaussian processes. In contrast to previous approaches, no knowledge about the distribution of inputs is needed. Simulation results for the Sonar data set are given.

classification, gaussian process, mean field method, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.05)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Opper, Manfred, Vivarelli, Francesco

General Bounds on Bayes Errors for Regression with Gaussian Processes

Based on a simple convexity lemma, we develop bounds for different types of Bayesian prediction errors for regression with Gaussian processes. The basic bounds are formulated for a fixed training set. Simpler expressions are obtained for sampling from an input distribution which equals the weight function of the covariance kernel, yielding asymptotically tight results. The results are compared with numerical experiments.

gaussian process, general bound, regression, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Modeling & Simulation (0.95)

Meir, Ron, Maiorov, Vitaly

On the Optimality of Incremental Neural Network Algorithms

We study the approximation of functions by two-layer feedforward neural networks, focusing on incremental algorithms which greedily add units, estimating single unit parameters at each stage. As opposed to standard algorithms for fixed architectures, the optimization at each stage is performed over a small number of parameters, mitigating many of the difficult numerical problems inherent in high-dimensional nonlinear optimization. We establish upper bounds on the error incurred by the algorithm, when approximating functions from the Sobolev class, thereby extending previous results which only provided rates of convergence for functions in certain convex hulls of functional spaces. By comparing our results to recently derived lower bounds, we show that the greedy algorithms are nearly optimal. Combined with estimation error results for greedy algorithms, a strong case can be made for this type of approach.

algorithm, approximation, neural network, (14 more...)

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.05)
North America > United States (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)