AITopics

The perturbation method which we have presented overcomes the limitations of standard approaches, which are only appropriate for models with a single layer of adjustable weights, albeit at considerable computational expense. It has the added bonus of automatically taking into account the effect of regularisation techniques such as weight decay. The experimental results illustrate the application of the technique to two simple problems. As expected the number of degrees of freedom in the models is found to be related to the amount of weight decay used during training. The equivalent kernels are found to vary significantly in different regions of input space and the functions reconstructed from the estimated smoother matrices closely match the origna!

equivalent kernel, kernel, neural network, (10 more...)

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Burges, Christopher J. C., Schölkopf, Bernhard

Improving the Accuracy and Speed of Support Vector Machines

Support Vector Learning Machines (SVM) are finding application in pattern recognition, regression estimation, and operator inversion for ill-posed problems. Against this very general backdrop, any methods for improving the generalization performance, or for improving the speed in test phase, of SVMs are of increasing interest. In this paper we combine two such techniques on a pattern recognition problem. The method for improving generalization performance (the "virtual support vector" method) does so by incorporating known invariances of the problem. This method achieves a drop in the error rate on 10,000 NIST test digit images of 1.4% to 1.0%.

support vector, vapnik, vector, (14 more...)

Country:

North America > United States > New York (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Bradley, Paul S., Mangasarian, Olvi L., Street, W. Nick

Clustering via Concave Minimization

There are many approaches to this problem, including statistical [9], machine learning [7], integer and mathematical programming [18,1]. In this paper we concentrate on a simple concave minimization formulation of the problem that leads to a finite and fast algorithm.

algorithm, correctness, k-median algorithm, (14 more...)

Country:

North America > United States > Oklahoma > Payne County > Stillwater (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Barber, David, Williams, Christopher K. I.

Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo

The full Bayesian method for applying neural networks to a prediction problem is to set up the prior/hyperprior structure for the net and then perform the necessary integrals. However, these integrals are not tractable analytically, and Markov Chain Monte Carlo (MCMC) methods are slow, especially if the parameter space is high-dimensional. Using Gaussian processes we can approximate the weight space integral analytically, so that only a small number of hyperparameters need be integrated over by MCMC methods. We have applied this idea to classification problems, obtaining excellent results on the real-world problems investigated so far. 1 INTRODUCTION To make predictions based on a set of training data, fundamentally we need to combine our prior beliefs about possible predictive functions with the data at hand. In the Bayesian approach to neural networks a prior on the weights in the net induces a prior distribution over functions.

bayesian classification, gaussian process, hyperparameter, (12 more...)

Country: Europe > United Kingdom (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Consistent Classification, Firm and Soft

Baram, Yoram

A classifier is called consistent with respect to a given set of classlabeled points if it correctly classifies the set. We consider classifiers defined by unions of local separators and propose algorithms for consistent classifier reduction. The expected complexities of the proposed algorithms are derived along with the expected classifier sizes. In particular, the proposed approach yields a consistent reduction of the nearest neighbor classifier, which performs "firm" classification, assigning each new object to a class, regardless of the data structure. The proposed reduction method suggests a notion of "soft" classification, allowing for indecision with respect to objects which are insufficiently or ambiguously supported by the data. The performances of the proposed classifiers in predicting stock behavior are compared to that achieved by the nearest neighbor method.

classification, classifier, separator, (14 more...)

Country:

North America > United States (0.14)
Asia > Middle East > Israel > Haifa District > Haifa (0.05)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)

Industry: Banking & Finance > Trading (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Zeevi, Assaf J., Meir, Ron, Adler, Robert J.

Time Series Prediction using Mixtures of Experts

We consider the problem of prediction of stationary time series, using the architecture known as mixtures of experts (MEM). Here we suggest a mixture which blends several autoregressive models. This study focuses on some theoretical foundations of the prediction problem in this context. More precisely, it is demonstrated that this model is a universal approximator, with respect to learning the unknown prediction function. This statement is strengthened as upper bounds on the mean squared error are established. Based on these results it is possible to compare the MEM to other families of models (e.g., neural networks and state dependent models). It is shown that a degenerate version of the MEM is in fact equivalent to a neural network, and the number of experts in the architecture plays a similar role to the number of hidden units in the latter model.

mem, neural network, predictor function, (12 more...)

Country:

North America > United States > North Carolina > Orange County > Chapel Hill (0.14)
North America > United States > New York (0.05)
Asia > Middle East > Jordan (0.05)
(3 more...)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Rohwer, Richard, Morciniec, Michal

The Generalisation Cost of RAMnets

We follow a similar approach to (Zhu & Rohwer, to appear 1996) in using a Gaussian process to define a prior over the space of functions, so that the expected generalisation cost under the posterior can be determined. The optimal model is defined in terms of the restriction of this posterior to the subspace defined by the model. The optimum is easily determined for linear models over a set of basis functions. We go on to compute the generalisation cost (with an error bar) for all models of this class, which we demonstrate to include the RAMnets.

formalism, generalisation cost, ramnet, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Krzyzak, Adam, Linder, Tamás

Radial Basis Function Networks and Complexity Regularization in Function Learning

In this paper we apply the method of complexity regularization to derive estimation bounds for nonlinear function estimation using a single hidden layer radial basis function network.

function network, neural network, radial basis function network, (10 more...)

Country:

Europe > Hungary > Budapest > Budapest (0.05)
North America > United States > New York (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Drucker, Harris, Burges, Christopher J. C., Kaufman, Linda, Smola, Alex J., Vapnik, Vladimir

Support Vector Regression Machines

A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.

dimensionality, feature space representation, representation, (15 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > New Jersey > Monmouth County > Long Branch (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

Bartlett, Peter L.

Baum and Haussler [4] used these results to give sample size bounds for multi-layer threshold networks Generalization and the Size of the Weights in Neural Networks 135 that grow at least as quickly as the number of weights (see also [7]). However, for pattern classification applications the VC-bounds seem loose; neural networks often perform successfully with training sets that are considerably smaller than the number of weights. This paper shows that for classification problems on which neural networks perform well, if the weights are not too big, the size of the weights determines the generalization performance. In contrast with the function classes and algorithms considered in the VC-theory, neural networks used for binary classification problems have real-valued outputs, and learning algorithms typically attempt to minimize the squared error of the network output over a training set. As well as encouraging the correct classification, this tends to push the output away from zero and towards the target values of { -1, I}.

dimension, fat-shattering dimension, misclassification probability, (15 more...)

Country:

North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)