Statistical Learning
Assessing Approximations for Gaussian Process Classification
Kuss, Malte, Rasmussen, Carl E.
Gaussian processes are attractive models for probabilistic classification but unfortunately exact inference is analytically intractable. We compare Laplace's method and Expectation Propagation (EP) focusing on marginal likelihood estimates and predictive performance. We explain theoretically and corroborate empirically that EP is superior to Laplace. We also compare to a sophisticated MCMC scheme and show that EP is surprisingly accurate. In recent years models based on Gaussian process (GP) priors have attracted much attention in the machine learning community. Whereas inference in the GP regression model with Gaussian noise can be done analytically, probabilistic classification using GPs is analytically intractable. Several approaches to approximate Bayesian inference have been suggested, including Laplace's approximation, Expectation Propagation (EP), variational approximations and Markov chain Monte Carlo (MCMC) sampling, some of these in conjunction with generalisation bounds, online learning schemes and sparse approximations. Despite the abundance of recent work on probabilistic GP classifiers, most experimental studies provide only anecdotal evidence, and no clear picture has yet emerged, as to when and why which algorithm should be preferred.
Generalization in Clustering with Unobserved Features
We argue that when objects are characterized by many attributes, clustering themon the basis of a relatively small random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove a finite sample generalization theoremsfor this novel learning scheme that extends analogous results from the supervised learning setting. The scheme is demonstrated for collaborative filtering of users with movies rating as attributes.
Measuring Shared Information and Coordinated Activity in Neuronal Networks
Klinkner, Kristina, Shalizi, Cosma, Camperi, Marcelo
This activity often manifests itself as dynamically coordinated sequences of action potentials. Since multiple electrode recordings are now a standard tool in neuroscience research, it is important to have a measure of such network-wide behavioral coordinationand information sharing, applicable to multiple neural spike train data. We propose a new statistic, informational coherence, which measures how much better one unit can be predicted by knowing the dynamical state of another. We argue informational coherence is a measure of association and shared information which is superior to traditional pairwisemeasures of synchronization and correlation. To find the dynamical states, we use a recently-introduced algorithm which reconstructs effectivestate spaces from stochastic time series.
Benchmarking Non-Parametric Statistical Tests
Keller, Mikaela, Bengio, Samy, Wong, Siew Y.
Although nonparametric tests have already been proposed for that purpose, statisticalsignificance tests for nonstandard measures (different from the classification error) are less often used in the literature. This paper is an attempt at empirically verifying how these tests compare with more classical tests, on various conditions. More precisely, using a very large dataset to estimate the whole "population", we analyzed the behavior ofseveral statistical test, varying the class unbalance, the compared models, the performance measure, and the sample size. The main result isthat providing big enough evaluation sets nonparametric tests are relatively reliable in all conditions.
A matching pursuit approach to sparse Gaussian process regression
In this paper we propose a new basis selection criterion for building sparse GP regression models that provides promising gains in accuracy as well as efficiency over previous methods. Our algorithm is much faster than that of Smola and Bartlett, while, in generalization it greatly outperforms theinformation gain approach proposed by Seeger et al, especially on the quality of predictive distributions.
Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
Kapoor, Ashish, Ahn, Hyungil, Qi, Yuan, Picard, Rosalind W.
There have been many graph-based approaches for semi-supervised classification. Oneproblem is that of hyperparameter learning: performance depends greatly on the hyperparameters of the similarity graph, transformation ofthe graph Laplacian and the noise model. We present a Bayesian framework for learning hyperparameters for graph-based semisupervised classification.Given some labeled data, which can contain inaccurate labels, we pose the semi-supervised classification as an inference problemover the unknown labels. Expectation Propagation is used for approximate inference and the mean of the posterior is used for classification. The hyperparameters are learned using EM for evidence maximization. We also show that the posterior mean can be written in terms of the kernel matrix, providing a Bayesian classifier to classify new points. Tests on synthetic and real datasets show cases where there are significant improvements in performance over the existing approaches.
Worst-Case Bounds for Gaussian Process Models
Kakade, Sham M., Seeger, Matthias W., Foster, Dean P.
Dean P. Foster University of Pennsylvania We present a competitive analysis of some nonparametric Bayesian algorithms ina worst-case online learning setting, where no probabilistic assumptions about the generation of the data are made. We consider models which use a Gaussian process prior (over the space of all functions) andprovide bounds on the regret (under the log loss) for commonly usednon-parametric Bayesian algorithms -- including Gaussian regression and logistic regression -- which show how these algorithms can perform favorably under rather general conditions.
Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
Juditsky, Anatoli, Nazin, Alexander, Tsybakov, Alexandre, Vayatis, Nicolas
For this purpose, we propose a stochastic procedure, the mirror descent, which performs gradient descent inthe dual space. The generated estimates are additionally averaged in a recursive fashion with specific weights. Mirror descent algorithms havebeen developed in different contexts and they are known to be particularly efficient in high dimensional problems. Moreover their implementation is adapted to the online setting. The main result of the paper is the upper bound on the convergence rate for the generalization error.
A Probabilistic Approach for Optimizing Spectral Clustering
Jin, Rong, Kang, Feng, Ding, Chris H.
Spectral clustering enjoys its success in both data clustering and semisupervised learning.But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems.Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. Inthis paper, we present a new spectral clustering algorithm, named "Soft Cut". It improves the normalized cut algorithm by introducing softmembership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm.