Goto

Collaborating Authors

 Europe


Efficient Online Learning via Randomized Rounding

Neural Information Processing Systems

Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, which combines ``random playout'' and randomized rounding of loss subgradients. As an application of our approach, we provide the first computationally efficient online algorithm for collaborative filtering with trace-norm constrained matrices. As a second application, we solve an open question linking batch learning and transductive online learning.


Inductive reasoning about chimeric creatures

Neural Information Processing Systems

Given one feature of a novel animal, humans readily make inferences about other features of the animal. For example, winged creatures often fly, and creatures that eat fish often live in the water. We explore the knowledge that supports these inferences and compare two approaches. The first approach proposes that humans rely on abstract representations of dependency relationships between features, and is formalized here as a graphical model. The second approach proposes that humans rely on specific knowledge of previously encountered animals, and is formalized here as a family of exemplar models. We evaluate these models using a task where participants reason about chimeras, or animals with pairs of features that have not previously been observed to co-occur. The results support the hypothesis that humans rely on explicit representations of relationships between features.


Thinning Measurement Models and Questionnaire Design

Neural Information Processing Systems

Inferring key unobservable features of individuals is an important task in the applied sciences. In particular, an important source of data in fields such as marketing, social sciences and medicine is questionnaires: answers in such questionnaires are noisy measures of target unobserved features. While comprehensive surveys help to better estimate the latent variables of interest, aiming at a high number of questions comes at a price: refusal to participate in surveys can go up, as well as the rate of missing data; quality of answers can decline; costs associated with applying such questionnaires can also increase. In this paper, we cast the problem of refining existing models for questionnaire data as follows: solve a constrained optimization problem of preserving the maximum amount of information found in a latent variable model using only a subset of existing questions. The goal is to find an optimal subset of a given size. For that, we first define an information theoretical measure for quantifying the quality of a reduced questionnaire. Three different approximate inference methods are introduced to solve this problem. Comparisons against a simple but powerful heuristic are presented.


Robust Multi-Class Gaussian Process Classification

Neural Information Processing Systems

Multi-class Gaussian Process Classifiers (MGPCs) are often affected by over-fitting problems when labeling errors occur far from the decision boundaries. To prevent this, we investigate a robust MGPC (RMGPC) which considers labeling errors independently of their distance to the decision boundaries. Expectation propagation is used for approximate inference. Experiments with several datasets in which noise is injected in the class labels illustrate the benefits of RMGPC. This method performs better than other Gaussian process alternatives based on considering latent Gaussian noise or heavy-tailed processes. When no noise is injected in the labels, RMGPC still performs equal or better than the other methods. Finally, we show how RMGPC can be used for successfully identifying data instances which are difficult to classify accurately in practice.


Additive Gaussian Processes

Neural Information Processing Systems

We introduce a Gaussian process model of functions which are additive. An additive function is one which decomposes into a sum of low-dimensional functions, each depending on only a subset of the input variables. Additive GPs generalize both Generalized Additive Models, and the standard GP models which use squared-exponential kernels. Hyperparameter learning in this model can be seen as Bayesian Hierarchical Kernel Learning (HKL). We introduce an expressive but tractable parameterization of the kernel function, which allows efficient evaluation of all input interaction terms, whose number is exponential in the input dimension. The additional structure discoverable by this model results in increased interpretability, as well as state-of-the-art predictive power in regression tasks.


A Global Structural EM Algorithm for a Model of Cancer Progression

Neural Information Processing Systems

Cancer has complex patterns of progression that include converging as well as diverging progressional pathways. Vogelstein's path model of colon cancer was a pioneering contribution to cancer research. Since then, several attempts have been made at obtaining mathematical models of cancer progression, devising learning algorithms, and applying these to cross-sectional data. Beerenwinkel {\em et al.} provided, what they coined, EM-like algorithms for Oncogenetic Trees (OTs) and mixtures of such. Given the small size of current and future data sets, it is important to minimize the number of parameters of a model. For this reason, we too focus on tree-based models and introduce Hidden-variable Oncogenetic Trees (HOTs). In contrast to OTs, HOTs allow for errors in the data and thereby provide more realistic modeling. We also design global structural EM algorithms for learning HOTs and mixtures of HOTs (HOT-mixtures). The algorithms are global in the sense that, during the M-step, they find a structure that yields a global maximum of the expected complete log-likelihood rather than merely one that improves it. The algorithm for single HOTs performs very well on reasonable-sized data sets, while that for HOT-mixtures requires data sets of sizes obtainable only with tomorrow's more cost-efficient technologies.


Interpolable Formulas in Equilibrium Logic and Answer Set Programming

Journal of Artificial Intelligence Research

Interpolation is an important property of classical and many non-classical logics that has been shown to have interesting applications in computer science and AI. Here we study the Interpolation Property for the the non-monotonic system of equilibrium logic, establishing weaker or stronger forms of interpolation depending on the precise interpretation of the inference relation. These results also yield a form of interpolation for ground logic programs under the answer sets semantics. For disjunctive logic programs we also study the property of uniform interpolation that is closely related to the concept of variable forgetting. The first-order version of equilibrium logic has analogous Interpolation properties whenever the collection of equilibrium models is (first-order) definable. Since this is the case for so-called safe programs and theories, it applies to the usual situations that arise in practical answer set programming.


Multi-Robot Adversarial Patrolling: Facing a Full-Knowledge Opponent

Journal of Artificial Intelligence Research

The problem of adversarial multi-robot patrol has gained interest in recent years, mainly due to its immediate relevance to various security applications. In this problem, robots are required to repeatedly visit a target area in a way that maximizes their chances of detecting an adversary trying to penetrate through the patrol path. When facing a strong adversary that knows the patrol strategy of the robots, if the robots use a deterministic patrol algorithm, then in many cases it is easy for the adversary to penetrate undetected (in fact, in some of those cases the adversary can guarantee penetration). Therefore this paper presents a non-deterministic patrol framework for the robots. Assuming that the strong adversary will take advantage of its knowledge and try to penetrate through the patrol's weakest spot, hence an optimal algorithm is one that maximizes the chances of detection in that point. We therefore present a polynomial-time algorithm for determining an optimal patrol under the Markovian strategy assumption for the robots, such that the probability of detecting the adversary in the patrol's weakest spot is maximized. We build upon this framework and describe an optimal patrol strategy for several robotic models based on their movement abilities (directed or undirected) and sensing abilities (perfect or imperfect), and in different environment models - either patrol around a perimeter (closed polygon) or an open fence (open polyline).


High-dimensional Sparse Inverse Covariance Estimation using Greedy Methods

arXiv.org Machine Learning

In this paper we consider the task of estimating the non-zero pattern of the sparse inverse covariance matrix of a zero-mean Gaussian random vector from a set of iid samples. Note that this is also equivalent to recovering the underlying graph structure of a sparse Gaussian Markov Random Field (GMRF). We present two novel greedy approaches to solving this problem. The first estimates the non-zero covariates of the overall inverse covariance matrix using a series of global forward and backward greedy steps. The second estimates the neighborhood of each node in the graph separately, again using greedy forward and backward steps, and combines the intermediate neighborhoods to form an overall estimate. The principal contribution of this paper is a rigorous analysis of the sparsistency, or consistency in recovering the sparsity pattern of the inverse covariance matrix. Surprisingly, we show that both the local and global greedy methods learn the full structure of the model with high probability given just $O(d\log(p))$ samples, which is a \emph{significant} improvement over state of the art $\ell_1$-regularized Gaussian MLE (Graphical Lasso) that requires $O(d^2\log(p))$ samples. Moreover, the restricted eigenvalue and smoothness conditions imposed by our greedy methods are much weaker than the strong irrepresentable conditions required by the $\ell_1$-regularization based methods. We corroborate our results with extensive simulations and examples, comparing our local and global greedy methods to the $\ell_1$-regularized Gaussian MLE as well as the Neighborhood Greedy method to that of nodewise $\ell_1$-regularized linear regression (Neighborhood Lasso).


Estimation And Selection Via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

arXiv.org Machine Learning

The $\ell_1$-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted $\ell_1$-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted $\ell_1$-penalized estimator in sparse, high-dimensional settings where the number of predictors $p$ can be much larger than the sample size $n$. Adaptive Lasso is considered as a special case. A multistage method is developed to apply an adaptive Lasso recursively. We provide $\ell_q$ oracle inequalities, a general selection consistency theorem, and an upper bound on the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results.