Regression
Semi-analytic approximate stability selection for correlated data in generalized linear models
Takahashi, Takashi, Kabashima, Yoshiyuki
We consider the variable selection problem of generalized linear models (GLMs). Stability selection (SS) is a promising method proposed for solving this problem. Although SS provides practical variable selection criteria, it is computationally demanding because it needs to fit GLMs to many re-sampled datasets. We propose a novel approximate inference algorithm that can conduct SS without the repeated fitting. The algorithm is based on the replica method of statistical mechanics and vector approximate message passing of information theory. For datasets characterized by rotation-invariant matrix ensembles, we derive state evolution equations that macroscopically describe the dynamics of the proposed algorithm. We also show that their fixed points are consistent with the replica symmetric solution obtained by the replica method. Numerical experiments indicate that the algorithm exhibits fast convergence and high approximation accuracy for both synthetic and real-world data.
Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models
Wu, Shanshan, Sanghavi, Sujay, Dimakis, Alexandros G.
We characterize the effectiveness of a classical algorithm for recovering the Markov graph of a general discrete pairwise graphical model from i.i.d. The algorithm is (appropriately regularized) maximum conditional log-likelihood, which involves solving a convex program for each node; for Ising models this is $\ell_1$-constrained logistic regression, while for more general alphabets an $\ell_{2,1}$ group-norm constraint needs to be used. We show that this algorithm can recover any arbitrary discrete pairwise graphical model, and also characterize its sample complexity as a function of model width, alphabet size, edge parameter accuracy, and the number of variables. We show that along every one of these axes, it matches or improves on all existing results and algorithms for this problem. Our analysis applies a sharp generalization error bound for logistic regression when the weight vector has an $\ell_1$ (or $\ell_{2,1}$) constraint and the sample vector has an $\ell_{\infty}$ (or $\ell_{2, \infty}$) constraint.
List-decodable Linear Regression
Karmalkar, Sushrut, Klivans, Adam, Kothari, Pravesh
We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. It outputs a list L of size O(1/\alpha) - a fixed constant - that contains an \ell that is close to \ell *. Our algorithm succeeds whenever the inliers are chosen from a certifiably anti-concentrated distribution D. In particular, this gives a (d/\alpha) {O(1/\alpha 8)} time algorithm to find a O(1/\alpha) size list when the inlier distribution is a standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that \ell * has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary.
Iterative Least Trimmed Squares for Mixed Linear Regression
Given a linear regression setting, Iterative Least Trimmed Squares (ILTS) involves alternating between (a) selecting the subset of samples with lowest current loss, and (b) re-fitting the linear model only on that subset. Both steps are very fast and simple. In this paper, we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C). We first establish deterministic conditions (on the features etc.) under which the ILTS iterate converges linearly to the closest mixture component. We also provide a global algorithm that uses ILTS as a subroutine, to fully solve mixed linear regressions with corruptions.
A First-Order Algorithmic Framework for Distributionally Robust Logistic Regression
LI, JIAJIN, HUANG, SEN, So, Anthony Man-Cho
Wasserstein distance-based distributionally robust optimization (DRO) has received much attention lately due to its ability to provide a robustness interpretation of various learning models. Moreover, many of the DRO problems that arise in the learning context admits exact convex reformulations and hence can be tackled by off-the-shelf solvers. Nevertheless, the use of such solvers severely limits the applicability of DRO in large-scale learning problems, as they often rely on general purpose interior-point algorithms. On the other hand, there are very few works that attempt to develop fast iterative methods to solve these DRO problems, which typically possess complicated structures. In this paper, we take a first step towards resolving the above difficulty by developing a first-order algorithmic framework for tackling a class of Wasserstein distance-based distributionally robust logistic regression (DRLR) problem. Specifically, we propose a novel linearized proximal ADMM to solve the DRLR problem, whose objective is convex but consists of a smooth term plus two non-separable non-smooth terms.
First order expansion of convex regularized estimators
Bellec, Pierre, Kuchibhotla, Arun
We consider first order expansions of convex penalized estimators in high-dimensional regression problems with random designs. Our setting includes linear regression and logistic regression as special cases. For a given penalty function $h$ and the corresponding penalized estimator $\hbeta$, we construct a quantity $\eta$, the first order expansion of $\hbeta$, such that the distance between $\hbeta$ and $\eta$ is an order of magnitude smaller than the estimation error $\ \hat{\beta} - \beta *\ $. In this sense, the first order expansion $\eta$ can be thought of as a generalization of influence functions from the mathematical statistics literature to regularized estimators in high-dimensions. Such first order expansion implies that the risk of $\hat{\beta}$ is asymptotically the same as the risk of $\eta$ which leads to a precise characterization of the MSE of $\hbeta$; this characterization takes a particularly simple form for isotropic design.
Partitioning Structure Learning for Segmented Linear Regression Trees
This paper proposes a partitioning structure learning method for segmented linear regression trees (SLRT), which assigns linear predictors over the terminal nodes. The recursive partitioning process is driven by an adaptive split selection algorithm that maximizes, at each node, a criterion function based on a conditional Kendall's τ statistic that measures the rank dependence between the regressors and the fit- ted linear residuals. Theoretical analysis shows that the split selection algorithm permits consistent identification and estimation of the unknown segments. A suffi- ciently large tree is induced by applying the split selection algorithm recursively. Then the minimal cost-complexity tree pruning procedure is applied to attain the right-sized tree, that ensures (i) the nested structure of pruned subtrees and (ii) consistent estimation to the number of segments.
Fast Sparse Group Lasso
Ida, Yasutoshi, Fujiwara, Yasuhiro, Kashima, Hisashi
Sparse Group Lasso is a method of linear regression analysis that finds sparse parameters in terms of both feature groups and individual features. Block Coordinate Descent is a standard approach to obtain the parameters of Sparse Group Lasso, and iteratively updates the parameters for each parameter group. However, as an update of only one parameter group depends on all the parameter groups or data points, the computation cost is high when the number of the parameters or data points is large. This paper proposes a fast Block Coordinate Descent for Sparse Group Lasso. It efficiently skips the updates of the groups whose parameters must be zeros by using the parameters in one group. In addition, it preferentially updates parameters in a candidate group set, which contains groups whose parameters must not be zeros.
Differentially Private Bayesian Linear Regression
Bernstein, Garrett, Sheldon, Daniel R.
Linear regression is an important tool across many fields that work with sensitive human-sourced data. Significant prior work has focused on producing differentially private point estimates, which provide a privacy guarantee to individuals while still allowing modelers to draw insights from data by estimating regression coefficients. We investigate the problem of Bayesian linear regression, with the goal of computing posterior distributions that correctly quantify uncertainty given privately released statistics. We show that a naive approach that ignores the noise injected by the privacy mechanism does a poor job in realistic data settings. We then develop noise-aware methods that perform inference over the privacy mechanism and produce correct posteriors across a wide range of scenarios.
Predicting Performance of Asynchronous Differentially-Private Learning
Farokhi, Farhad, Kaafar, Mohamed Ali
We consider training machine learning models using Training data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. In this paper, we develop differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We prove that we can forecast the performance of the proposed privacy-preserving asynchronous algorithms. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher.