AITopics

Gaussian Processes are powerful regression models specified by parametrized mean and covariance functions. Standard approaches to estimate these parameters (known by the name Hyperparameters) are Maximum Likelihood (ML) and Maximum APosterior (MAP) approaches. In this paper, we propose and investigate predictive approaches, namely, maximization of Geisser's Surrogate Predictive Probability (GPP) and minimization of mean square error with respect to GPP (referred to as Geisser's Predictive mean square Error (GPE)) to estimate the hyperparameters. We also derive results for the standard Cross-Validation (CV) error and make a comparison. These approaches are tested on a number of problems and experimental results show that these approaches are strongly competitive to existing approaches. 1 Introduction Gaussian Processes (GPs) are powerful regression models that have gained popularity recently, though they have appeared in different forms in the literature for years.

choosing hyperparameter, gaussian process, predictive approach, (12 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
Asia > Singapore (0.05)
North America > United States > New York (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Sugiyama, Masashi, Ogawa, Hidemitsu

Training Data Selection for Optimal Generalization in Trigonometric Polynomial Networks

In this paper, we consider the problem of active learning in trigonometric polynomial networks and give a necessary and sufficient condition of sample points to provide the optimal generalization capability. By analyzing the condition from the functional analytic point of view, we clarify the mechanism of achieving the optimal generalization capability. We also show that a set of training examples satisfying the condition does not only provide the optimal generalization but also reduces the computational complexity and memory required for the calculation of learning results. Finally, examples of sample points satisfying the condition are given and computer simulations are performed to demonstrate the effectiveness of the proposed active learning method.

operator, sample point, training example, (10 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > United States > New York (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.80)

Slonim, Noam, Tishby, Naftali

Agglomerative Information Bottleneck

This question was recently shown in [9] to be a special case of a much more fundamental problem: What are the features of the variable X that are relevant for the prediction of another, relevance, variable Y?

algorithm, information, mutual information, (14 more...)

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers

Seeger, Matthias

We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes. The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training cases for validation. This opens the possibility to use sophisticated families of kernels in situations where the small "standard kernel" classes are clearly inappropriate. We relate the method to other work done on Gaussian processes and clarify the relation between Support Vector machines and certain Gaussian process models.

approximation, gaussian process, support vector machine, (11 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Wisconsin (0.05)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.72)

Greedy Importance Sampling

Schuurmans, Dale

I present a simple variation of importance sampling that explicitly searches for important regions in the target distribution. I prove that the technique yields unbiased estimates, and show empirically it can reduce the variance of standard Monte Carlo estimators. This is achieved by concentrating samples in more significant regions of the sample space. 1 Introduction It is well known that general inference and learning with graphical models is computationally hard [1] and it is therefore necessary to consider restricted architectures [13], or approximate algorithms to perform these tasks [3, 7]. Among the most convenient and successful techniques are stochastic methods which are guaranteed to converge to a correct solution in the limit oflarge samples [10, 11, 12, 15]. These methods can be easily applied to complex inference problems that overwhelm deterministic approaches.

estimator, procedure, random variable, (17 more...)

Country:

North America > United States > New York (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Schölkopf, Bernhard, Williamson, Robert C., Smola, Alex J., Shawe-Taylor, John, Platt, John C.

Support Vector Method for Novelty Detection

Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified

algorithm, feature space, outlier, (15 more...)

Country:

North America > United States > Washington > King County > Redmond (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.66)

Rusmevichientong, Paat, Roy, Benjamin Van

An Analysis of Turbo Decoding with Gaussian Densities

We provide an analysis of the turbo decoding algorithm (TDA) in a setting involving Gaussian densities. In this context, we are able to show that the algorithm converges and that - somewhat surprisingly - though the density generated by the TDA may differ significantly from the desired posterior density, the means of these two densities coincide.

converge, gaussian density, matrix, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.49)

Roth, Volker, Steinhage, Volker

Nonlinear Discriminant Analysis Using Kernel Functions

Fishers linear discriminant analysis (LDA) is a classical multivariate technique both for dimension reduction and classification. The data vectors are transformed into a low dimensional subspace such that the class centroids are spread out as much as possible. In this subspace LDA works as a simple prototype classifier with linear decision boundaries. However, in many applications the linear boundaries do not adequately separate the classes. We present a nonlinear generalization of discriminant analysis that uses the kernel trick of representing dot products by kernel functions.

algorithm, discriminant analysis, vector, (16 more...)

Country:

North America > United States > California > Monterey County > Monterey (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.74)

Rätsch, Gunnar, Schölkopf, Bernhard, Smola, Alex J., Müller, Klaus-Robert, Onoda, Takashi, Mika, Sebastian

v-Arc: Ensemble Learning in the Presence of Outliers

The idea of a large minimum margin [17] explains the good generalization performance of AdaBoost in the low noise regime. However, AdaBoost performs worse on noisy tasks [10, 11], such as the iris and the breast cancer benchmark data sets [1]. On the latter tasks, a large margin on all training points cannot be achieved without adverse effects on the generalization error. This experimental observation was supported by the study of [13] where the generalization error of ensemble methods was bounded by the sum of the fraction of training points which have a margin smaller than some value p, say, plus a complexity term depending on the base hypotheses and p. While this bound can only capture part of what is going on in practice, it nevertheless already conveys the message that in some cases it pays to allow for some points which have a small margin, or are misclassified, if this leads to a larger overall margin on the remaining points. To cope with this problem, it was mandatory to construct regularized variants of AdaBoost, which traded off the number of margin errors and the size of the margin 562 G. Riitsch, B. Sch6lkopf, A. J. Smola, K.-R.

adaboost, algorithm, ensemble learning, (16 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Oceania > Australia > Queensland (0.04)
(6 more...)

Industry:

Health & Medicine > Therapeutic Area (0.54)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

The Infinite Gaussian Mixture Model

Rasmussen, Carl Edward

In a Bayesian mixture model it is not necessary a priori to limit the number of components to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the "right" number of mixture components. Inference in the model is done using an efficient parameter-free Markov Chain that relies entirely on Gibbs sampling.

indicator, mixture model, unrepresented class, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Denmark > Capital Region > Kongens Lyngby (0.14)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)