Goto

Collaborating Authors

 Regression


Phase transitions for high-dimensional joint support recovery

Neural Information Processing Systems

We consider the following instance of transfer learning: given a pair of regression problems, suppose that the regression coefficients share a partially common support, parameterized by the overlap fraction $\overlap$ between the two supports. This set-up suggests the use of $1, \infty$-regularized linear regression for recovering the support sets of both regression vectors. Our main contribution is to provide a sharp characterization of the sample complexity of this $1,\infty$ relaxation, exactly pinning down the minimal sample size $n$ required for joint support recovery as a function of the model dimension $\pdim$, support size $\spindex$ and overlap $\overlap \in [0,1]$. For measurement matrices drawn from standard Gaussian ensembles, we prove that the joint $1,\infty$-regularized method undergoes a phase transition characterized by order parameter $\orpar( umobs, \pdim, \spindex, \overlap) umobs{(4 - 3 \overlap) s \log(p-(2-\overlap)s)}$. More precisely, the probability of successfully recovering both supports converges to $1$ for scalings such that $\orpar 1$, and converges to $0$ to scalings for which $\orpar 1$.


A Family of Penalty Functions for Structured Sparsity

Neural Information Processing Systems

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. We present a family of convex penalty functions, which encode this prior knowledge by means of a set of constraints on the absolute values of the regression coefficients. This family subsumes the $\ell_1$ norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish some important properties of these functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions.


Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference

Neural Information Processing Systems

We consider multivariate regression problems involving high-dimensional predictor and response spaces. To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique to account for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between the multiple outputs. We illustrate the utility of this framework for inferring causal relationships over a collection of high-dimensional time series variables. When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to PageRank. Theoretical guarantees, extensive simulations and empirical studies confirm the generality and value of our framework.


Directed Regression

Neural Information Processing Systems

When used to guide decisions, linear regression analysis typically involves estimation of regression coefficients via ordinary least squares and their subsequent use to make decisions. When there are multiple response variables and features do not perfectly capture their relationships, it is beneficial to account for the decision objective when computing regression coefficients. Empirical optimization does so but sacrifices performance when features are well-chosen or training data are insufficient. We propose directed regression, an efficient algorithm that combines merits of ordinary least squares and empirical optimization. We demonstrate through a computational study that directed regression can generate significant performance gains over either alternative.


Adaptive Multi-Task Lasso: with Application to eQTL Detection

Neural Information Processing Systems

To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci involved compared to the number of samples. Thus, to address the problem, it is desirable to take advantage of the structure of the data and prior information about genomic locations such as conservation scores and transcription factor binding sites. In this paper, we propose a novel regularized regression approach for detecting eQTLs which takes into account related traits simultaneously while incorporating many regulatory features. We first present a Bayesian network for a multi-task learning problem that includes priors on SNPs, making it possible to estimate the significance of each covariate adaptively.


Privacy-preserving logistic regression

Neural Information Processing Systems

This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. First we apply an idea of Dwork et al. to design a specific privacy-preserving machine learning algorithm, logistic regression. This involves bounding the sensitivity of logistic regression, and perturbing the learned classifier with noise proportional to the sensitivity. Noting that the approach of Dwork et al. has limitations when applied to other machine learning algorithms, we then present another privacy-preserving logistic regression algorithm. The algorithm is based on solving a perturbed objective, and does not depend on the sensitivity.


Optimal learning rates for Kernel Conjugate Gradient regression

Neural Information Processing Systems

We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is directly related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. The rates depend on two key quantities: first, on the regularity of the target regression function and second, on the effective dimensionality of the data mapped into the kernel space. Lower bounds on attainable rates depending on these two quantities were established in earlier literature, and we obtain upper bounds for the considered method that match these lower bounds (up to a log factor) if the true regression function belongs to the reproducing kernel Hilbert space. If the latter assumption is not fulfilled, we obtain similar convergence rates provided additional unlabeled data are available.


Continuous-Time Regression Models for Longitudinal Networks

Neural Information Processing Systems

The development of statistical models for continuous-time longitudinal network data is of increasing interest in machine learning and social science. Leveraging ideas from survival and event history analysis, we introduce a continuous-time regression modeling framework for network event data that can incorporate both time-dependent network statistics and time-varying regression coefficients. We also develop an efficient inference scheme that allows our approach to scale to large networks. On synthetic and real-world data, empirical results demonstrate that the proposed inference approach can accurately estimate the coefficients of the regression model, which is useful for interpreting the evolution of the network; furthermore, the learned model has systematically better predictive performance compared to standard baseline methods. Papers published at the Neural Information Processing Systems Conference.


A Polynomial-time Form of Robust Regression

Neural Information Processing Systems

Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point. We present a general formulation for robust regression --Variational M-estimation--that unifies a number of robust regression methods while allowing a tractable approximation strategy. We develop an estimator that requires only polynomial-time, while achieving certain robustness and consistency guarantees. An experimental evaluation demonstrates the effectiveness of the new estimation approach compared to standard methods. Papers published at the Neural Information Processing Systems Conference.


Sparse Features for PCA-Like Linear Regression

Neural Information Processing Systems

Principal Components Analysis (PCA) is often used as a feature extraction procedure. Given a matrix $X \in \mathbb{R} {n \times d}$, whose rows represent $n$ data points with respect to $d$ features, the top $k$ right singular vectors of $X$ (the so-called \textit{eigenfeatures}), are arbitrary linear combinations of all available features. The eigenfeatures are very useful in data analysis, including the regularization of linear regression. Enforcing sparsity on the eigenfeatures, i.e., forcing them to be linear combinations of only a \textit{small} number of actual features (as opposed to all available features), can promote better generalization error and improve the interpretability of the eigenfeatures. We present deterministic and randomized algorithms that construct such sparse eigenfeatures while \emph{provably} achieving in-sample performance comparable to regularized linear regression.