Goto

Collaborating Authors

 Genre


Variable Selection in High Dimensions with Random Designs and Orthogonal Matching Pursuit

arXiv.org Machine Learning

The performance of Orthogonal Matching Pursuit (OMP) for variable selection is analyzed for random designs. When contrasted with the deterministic case, since the performance is here measured after averaging over the distribution of the design matrix, one can have far less stringent sparsity constraints on the coefficient vector. We demonstrate that for exact sparse vectors, the performance of the OMP is similar to known results on the Lasso algorithm [\textit{IEEE Trans. Inform. Theory} \textbf{55} (2009) 2183--2202]. Moreover, variable selection under a more relaxed sparsity assumption on the coefficient vector, whereby one has only control on the $\ell_1$ norm of the smaller coefficients, is also analyzed. As a consequence of these results, we also show that the coefficient estimate satisfies strong oracle type inequalities.


Efficient variational inference in large-scale Bayesian compressed sensing

arXiv.org Machine Learning

We study linear models under heavy-tailed priors from a probabilistic viewpoint. Instead of computing a single sparse most probable (MAP) solution as in standard deterministic approaches, the focus in the Bayesian compressed sensing framework shifts towards capturing the full posterior distribution on the latent variables, which allows quantifying the estimation uncertainty and learning model parameters using maximum likelihood. The exact posterior distribution under the sparse linear model is intractable and we concentrate on variational Bayesian techniques to approximate it. Repeatedly computing Gaussian variances turns out to be a key requisite and constitutes the main computational bottleneck in applying variational techniques in large-scale problems. We leverage on the recently proposed Perturb-and-MAP algorithm for drawing exact samples from Gaussian Markov random fields (GMRF). The main technical contribution of our paper is to show that estimating Gaussian variances using a relatively small number of such efficiently drawn random samples is much more effective than alternative general-purpose variance estimation techniques. By reducing the problem of variance estimation to standard optimization primitives, the resulting variational algorithms are fully scalable and parallelizable, allowing Bayesian computations in extremely large-scale problems with the same memory and time complexity requirements as conventional point estimation techniques. We illustrate these ideas with experiments in image deblurring.


Visual Inference Specification Methods for Modularized Rulebases. Overview and Integration Proposal

arXiv.org Artificial Intelligence

The paper concerns selected rule modularization techniques. Three visual methods for inference specification for modularized rule- bases are described: Drools Flow, BPMN and XTT2. Drools Flow is a popular technology for workflow or process modeling, BPMN is an OMG standard for modeling business processes, and XTT2 is a hierarchical tab- ular system specification method. Because of some limitations of these solutions, several proposals of their integration are given.


Eliciting implicit assumptions of proofs in the MIZAR Mathematical Library by property omission

arXiv.org Artificial Intelligence

When formalizing proofs with interactive theorem provers, it often happens that extra background knowledge (declarative or procedural) about mathematical concepts is employed without the formalizer explicitly invoking it, to help the formalizer focus on the relevant details of the proof. In the contexts of producing and studying a formalized mathematical argument, such mechanisms are clearly valuable. But we may not always wish to suppress background knowledge. For certain purposes, it is important to know, as far as possible, precisely what background knowledge was implicitly employed in a formal proof. In this note we describe an experiment conducted on the MIZAR Mathematical Library of formal mathematical proofs to elicit one such class of implicitly employed background knowledge: properties of functions and relations (e.g., commutativity, asymmetry, etc.).


Gradient-based kernel dimension reduction for supervised learning

arXiv.org Machine Learning

This paper proposes a novel kernel approach to linear dimension reduction for supervised learning. The purpose of the dimension reduction is to find directions in the input space to explain the output as effectively as possible. The proposed method uses an estimator for the gradient of regression function, based on the covariance operators on reproducing kernel Hilbert spaces. In comparison with other existing methods, the proposed one has wide applicability without strong assumptions on the distributions or the type of variables, and uses computationally simple eigendecomposition. Experimental results show that the proposed method successfully finds the effective directions with efficient computation.


Learning Valuation Functions

arXiv.org Machine Learning

In this paper we study the approximate learnability of valuations commonly used throughout economics and game theory for the quantitative encoding of agent preferences. We provide upper and lower bounds regarding the learnability of important subclasses of valuation functions that express no-complementarities. Our main results concern their approximate learnability in the distributional learning (PAC-style) setting. We provide nearly tight lower and upper bounds of $\tilde{\Theta}(n^{1/2})$ on the approximation factor for learning XOS and subadditive valuations, both widely studied superclasses of submodular valuations. Interestingly, we show that the $\tilde{\Omega}(n^{1/2})$ lower bound can be circumvented for XOS functions of polynomial complexity; we provide an algorithm for learning the class of XOS valuations with a representation of polynomial size achieving an $O(n^{\eps})$ approximation factor in time $O(n^{1/\eps})$ for any $\eps > 0$. This highlights the importance of considering the complexity of the target function for polynomial time learning. We also provide new learning results for interesting subclasses of submodular functions. Our upper bounds for distributional learning leverage novel structural results for all these valuation classes. We show that many of these results provide new learnability results in the Goemans et al. model (SODA 2009) of approximate learning everywhere via value queries. We also introduce a new model that is more realistic in economic settings, in which the learner can set prices and observe purchase decisions at these prices rather than observing the valuation function directly. In this model, most of our upper bounds continue to hold despite the fact that the learner receives less information (both for learning in the distributional setting and with value queries), while our lower bounds naturally extend.


Structured Sparsity and Generalization

arXiv.org Machine Learning

We study a class of regularization methods used to learn a linear function from a finite set of examples. The regularizer is expressed as an infimum convolution which involves a set M of linear transformations (see equation (1) below). As we shall see, this regularizer generalizes, depending on the choice of the set M, the regularizers used by several learning algorithms, such as ridge regression, the Lasso, the group Lasso [22], multiple kernel learning [10], the group Lasso with overlap [6], and the regularizers in [16]. We give a bound on the Rademacher average of the linear function class associated with this regularizer. The result matches existing bounds in the above mentioned cases but also admits a novel, dimension free interpretation.


Bayesian nonparametric multivariate convex regression

arXiv.org Machine Learning

X, where f(x) is the gradient of f at x. This is called the convex regression problem. Convex regression can easily be modified to allow concave regression by multiplying all of the values by negative one. Convex regression problems are common in economics, operations research and reinforcement learning. In economics, production functions (Skiba 1978) and consumer preferences (Meyer & Pratt 1968) are often convex, while in operations research and reinforcement learning, value functions for stochastic optimization problems can be convex (Shapiro et al. 2009). If a problem is known to be convex, a convex regression estimate provides advantages over an unrestricted estimate. First, convexity is a powerful regularizer: it places strong conditions on the derivatives--and hence smoothness--of a function. Convexity constraints can substantially reduce overfitting and lead to more accurate predictions. Second, maintaining convexity allows the use of convex optimization solvers when the regression estimate is used in an objective function of an optimization problem. 1 Multivariate convex regression has received relatively little attention in the literature.


Nonlinear Channel Estimation for OFDM System by Complex LS-SVM under High Mobility Conditions

arXiv.org Machine Learning

A nonlinear channel estimator using complex Least Square Support Vector Machines (LS-SVM) is proposed for pilot-aided OFDM system and applied to Long Term Evolution (LTE) downlink under high mobility conditions. The estimation algorithm makes use of the reference signals to estimate the total frequency response of the highly selective multipath channel in the presence of non-Gaussian impulse noise interfering with pilot signals. Thus, the algorithm maps trained data into a high dimensional feature space and uses the structural risk minimization (SRM) principle to carry out the regression estimation for the frequency response function of the highly selective channel. The simulations show the effectiveness of the proposed method which has good performance and high precision to track the variations of the fading channels compared to the conventional LS method and it is robust at high speed mobility.


Controlling Complexity in Part-of-Speech Induction

Journal of Artificial Intelligence Research

We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task.