Not enough data to create a plot.
Try a different view from the menu above.
Country
Localizing Bugs in Program Executions with Graphical Models
Dietz, Laura, Dallmeier, Valentin, Zeller, Andreas, Scheffer, Tobias
We devise a graphical model that supports the process of debugging software by guiding developers to code that is likely to contain defects. The model is trained using execution traces of passing test runs; it reflects the distribution over transitional patterns of code positions. Given a failing test case, the model determines the least likely transitional pattern in the execution trace. The model is designed such that Bayesian inference has a closed-form solution. We evaluate the Bernoulli graph model on data of the software projects AspectJ and Rhino.
Clustering via LP-based Stabilities
Komodakis, Nikos, Paragios, Nikos, Tziritas, Georgios
A novel center-based clustering algorithm is proposed in this paper. We first formulate clustering as an NP-hard linear integer program and we then use linear programming and the duality theory to derive the solution of this optimization problem. This leads to an efficient and very general algorithm, which works in the dual domain, and can cluster data based on an arbitrary set of distances. Despite its generality, it is independent of initialization (unlike EM-like methods such as K-means), has guaranteed convergence, and can also provide online optimality bounds about the quality of the estimated clustering solutions. To deal with the most critical issue in a center-based clustering algorithm (selection of cluster centers), we also introduce the notion of stability of a cluster center, which is a well defined LP-based quantity that plays a key role to our algorithm's success. Furthermore, we also introduce, what we call, the margins (another key ingredient in our algorithm), which can be roughly thought of as dual counterparts to stabilities and allow us to obtain computationally efficient approximations to the latter. Promising experimental results demonstrate the potentials of our method.
Performance analysis for L\_2 kernel classification
We provide statistical performance guarantees for a recently introduced kernel classifier that optimizes the $L_2$ or integrated squared error (ISE) of a difference of densities. The classifier is similar to a support vector machine (SVM) in that it is the solution of a quadratic program and yields a sparse classifier. Unlike SVMs, however, the $L_2$ kernel classifier does not involve a regularization parameter. We prove a distribution free concentration inequality for a cross-validation based estimate of the ISE, and apply this result to deduce an oracle inequality and consistency of the classifier on the sense of both ISE and probability of error. Our results can also be specialized to give performance guarantees for an existing method of $L_2$ kernel density estimation.
Playing Pinball with non-invasive BCI
Krauledat, Matthias, Grzeska, Konrad, Sagebaum, Max, Blankertz, Benjamin, Vidaurre, Carmen, Mรผller, Klaus-Robert, Schrรถder, Michael
Compared to invasive Brain-Computer Interfaces (BCI), non-invasive BCI systems based on Electroencephalogram (EEG) signals have not been applied successfully for complex control tasks. In the present study, however, we demonstrate this is possible and report on the interaction of a human subject with a complex real device: a pinball machine. First results in this single subject study clearly show that fast and well-timed control well beyond chance level is possible, even though the environment is extremely rich and requires complex predictive behavior. Using machine learning methods for mental state decoding, BCI-based pinball control is possible within the first session without the necessity to employ lengthy subject training. While the current study is still of anecdotal nature, it clearly shows that very compelling control with excellent timing and dynamics is possible for a non-invasive BCI.
Domain Adaptation with Multiple Sources
Mansour, Yishay, Mohri, Mehryar, Rostamizadeh, Afshin
This paper presents a theoretical analysis of the problem of adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most \epsilon are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most \epsilon with respect to *any* target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3\epsilon. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.
Learning a Small Mixture of Trees
The problem of approximating a given probability distribution using a simpler distribution plays an important role in several areas of machine learning, e.g. variational inference and classification. Within this context, we consider the task of learning a mixture of tree distributions. Although mixtures of trees can be learned by minimizing the KL-divergence using an EM algorithm, its success depends heavily on the initialization. We propose an efficient strategy for obtaining a good initial set of trees that attempts to cover the entire observed distribution by minimizing the $\alpha$-divergence with $\alpha = \infty$. We formulate the problem using the fractional covering framework and present a convergent sequential algorithm that only relies on solving a convex program at each iteration. Compared to previous methods, our approach results in a significantly smaller mixture of trees that provides similar or better accuracies. We demonstrate the usefulness of our approach by learning pictorial structures for face recognition.
Probabilistic Relational PCA
Li, Wu-jun, Yeung, Dit-Yan, Zhang, Zhihua
One crucial assumption made by both principal component analysis (PCA) and probabilistic PCA (PPCA) is that the instances are independent and identically distributed (i.i.d.). However, this common i.i.d. assumption is unreasonable for relational data. In this paper, by explicitly modeling covariance between instances as derived from the relational information, we propose a novel probabilistic dimensionality reduction method, called probabilistic relational PCA (PRPCA), for relational data analysis. Although the i.i.d. assumption is no longer adopted in PRPCA, the learning algorithms for PRPCA can still be devised easily like those for PPCA which makes explicit use of the i.i.d. assumption. Experiments on real-world data sets show that PRPCA can effectively utilize the relational information to dramatically outperform PCA and achieve state-of-the-art performance.
Bayesian Synchronous Grammar Induction
Blunsom, Phil, Cohn, Trevor, Osborne, Miles
We present a novel method for inducing synchronous context free grammars (SCFGs) from a corpus of parallel string pairs. SCFGs can model equivalence between strings in terms of substitutions, insertions and deletions, and the reordering of sub-strings. We develop a non-parametric Bayesian model and apply it to a machine translation task, using priors to replace the various heuristics commonly used in this field. Using a variational Bayes training procedure, we learn the latent structure of translation equivalence through the induction of synchronous grammar categories for phrasal translations, showing improvements in translation performance over previously proposed maximum likelihood models.
Learning in Markov Random Fields using Tempered Transitions
Markov random fields (MRF's), or undirected graphical models, provide a powerful frameworkfor modeling complex dependencies among random variables. Maximum likelihood learning in MRF's is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithmsof the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators basedon tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF's. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.
Linear-time Algorithms for Pairwise Statistical Problems
Ram, Parikshit, Lee, Dongryeol, March, William, Gray, Alexander G.
Several key computational bottlenecks in machine learning involve pairwise distance computations, including all-nearest-neighbors (finding the nearest neighbor(s) for each point, e.g. in manifold learning) and kernel summations (e.g. in kernel density estimation or kernel machines). We consider the general, bichromatic case for these problems, in addition to the scientific problem of N-body potential calculation. In this paper we show for the first time O(N) worst case runtimes for practical algorithms for these problems based on the cover tree data structure (Beygelzimer, Kakade, Langford, 2006).