Europe
Comparing Beliefs, Surveys, and Random Walks
Aurell, Erik, Gordon, Uri, Kirkpatrick, Scott
Survey propagation is a powerful technique from statistical physics that has been applied to solve the 3-SAT problem both in principle and in practice. We give, using only probability arguments, a common derivation of survey propagation, belief propagation and several interesting hybrid methods. We then present numerical experiments which use WSAT (a widely used random-walk based SAT solver) to quantify the complexity of the 3-SAT formulae as a function of their parameters, both as randomly generated and after simpli£cation, guided by survey propagation. Some properties of WSAT which have not previously been reported make it an ideal tool for this purpose - its mean cost is proportional to the number of variables in the formula (at a £xed ratio of clauses to variables) in the easy-SAT regime and slightly beyond, and its behavior in the hard-SAT regime appears to re¤ect the underlying structure of the solution space that has been predicted by replica symmetry-breaking arguments. An analysis of the tradeoffs between the various methods of search for satisfying assignments shows WSAT to be far more powerful than has been appreciated, and suggests some interesting new directions for practical algorithm development.
A Direct Formulation for Sparse PCA Using Semidefinite Programming
D', aspremont, Alexandre, Ghaoui, Laurent E., Jordan, Michael I., Lanckriet, Gert R.
We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem.
Harmonising Chorales by Probabilistic Inference
Allan, Moray, Williams, Christopher
We describe how we used a data set of chorale harmonisations composed by Johann Sebastian Bach to train Hidden Markov Models. Using a probabilistic framework allows us to create a harmonisation system which learns from examples, and which can compose new harmonisations. We make a quantitative comparison of our system's harmonisation performance against simpler models, and provide example harmonisations.
Learning Preferences for Multiclass Problems
Aiolli, Fabio, Sperduti, Alessandro
Many interesting multiclass problems can be cast in the general framework of label ranking defined on a given set of classes. The evaluation for such a ranking is generally given in terms of the number of violated order constraints between classes. In this paper, we propose the Preference Learning Model as a unifying framework to model and solve a large class of multiclass problems in a large margin perspective. In addition, an original kernel-based method is proposed and evaluated on a ranking dataset with state-of-the-art results.
A Large Deviation Bound for the Area Under the ROC Curve
Agarwal, Shivani, Graepel, Thore, Herbrich, Ralf, Roth, Dan
The area under the ROC curve (AUC) has been advocated as an evaluation criterion for the bipartite ranking problem. We study large deviation properties of the AUC; in particular, we derive a distribution-free large deviation bound for the AUC which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on an independent test sequence. A comparison of our result with a corresponding large deviation result for the classification error rate suggests that the test sample size required to obtain an ɛ-accurate estimate of the expected accuracy of a ranking function with δ-confidence is larger than that required to obtain an ɛ-accurate estimate of the expected error rate of a classification function with the same confidence. A simple application of the union bound allows the large deviation bound to be extended to learned ranking functions chosen from finite function classes.
Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
Zhu, Jerry, Kandola, Jaz, Ghahramani, Zoubin, Lafferty, John D.
We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results.
Kernel Projection Machine: a New Tool for Pattern Recognition
Zwald, Laurent, Blanchard, Gilles, Massart, Pascal, Vert, Régis
This paper investigates the effect of Kernel Principal Component Analysis (KPCA)within the classification framework, essentially the regularization propertiesof this dimensionality reduction method. KPCA has been previously used as a pre-processing step before applying an SVM but we point out that this method is somewhat redundant from a regularization pointof view and we propose a new algorithm called Kernel Projection Machine to avoid this redundancy, based on an analogy with the statistical framework of regression for a Gaussian white noise model. Preliminary experimental results show that this algorithm reaches the same performances as an SVM.
Self-Tuning Spectral Clustering
Zelnik-manor, Lihi, Perona, Pietro
We study a number of open issues in spectral clustering: (i) Selecting the appropriate scale of analysis, (ii) Handling multi-scale data, (iii) Clustering withirregular background clutter, and, (iv) Finding automatically the number of groups. We first propose that a'local' scale should be used to compute the affinity between each pair of points. This local scaling leads to better clustering especially when the data includes multiple scales and when the clusters are placed within a cluttered background. We further suggest exploiting the structure of the eigenvectors to infer automatically the number of groups. This leads to a new algorithm in which the final randomly initialized k-means stage is eliminated.
Instance-Specific Bayesian Model Averaging for Classification
Visweswaran, Shyam, Cooper, Gregory F.
Classification algorithms typically induce population-wide models that are trained to perform well on average on expected future instances. We introduce a Bayesian framework for learning instance-specific models from data that are optimized to predict well for a particular instance. Based on this framework, we present a lazy instance-specific algorithm called ISA that performs selective model averaging over a restricted class of Bayesian networks. On experimental evaluation, this algorithm shows superior performance over model selection. We intend to apply such instance-specific algorithms to improve the performance of patient-specific predictive models induced from medical data.
Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices
Srebro, Nathan, Alon, Noga, Jaakkola, Tommi S.
We prove generalization error bounds for predicting entries in a partially observed matrix by fitting the observed entries with a low-rank matrix. In justifying the analysis approach we take to obtain the bounds, we present an example of a class of functions of finite pseudodimension such that the sums of functions from this class have unbounded pseudodimension.