AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Semi-analytic approximate stability selection for correlated data in generalized linear models

Takahashi, Takashi, Kabashima, Yoshiyuki

arXiv.org Machine LearningMar-19-2020

We consider the variable selection problem of generalized linear models (GLMs). Stability selection (SS) is a promising method proposed for solving this problem. Although SS provides practical variable selection criteria, it is computationally demanding because it needs to fit GLMs to many re-sampled datasets. We propose a novel approximate inference algorithm that can conduct SS without the repeated fitting. The algorithm is based on the replica method of statistical mechanics and vector approximate message passing of information theory. For datasets characterized by rotation-invariant matrix ensembles, we derive state evolution equations that macroscopically describe the dynamics of the proposed algorithm. We also show that their fixed points are consistent with the replica symmetric solution obtained by the replica method. Numerical experiments indicate that the algorithm exhibits fast convergence and high approximation accuracy for both synthetic and real-world data.

algorithm, diagm, selection probability, (15 more...)

arXiv.org Machine Learning

2003.0867

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models

Wu, Shanshan, Sanghavi, Sujay, Dimakis, Alexandros G.

Neural Information Processing SystemsMar-18-2020, 23:48:00 GMT

We characterize the effectiveness of a classical algorithm for recovering the Markov graph of a general discrete pairwise graphical model from i.i.d. The algorithm is (appropriately regularized) maximum conditional log-likelihood, which involves solving a convex program for each node; for Ising models this is $\ell_1$-constrained logistic regression, while for more general alphabets an $\ell_{2,1}$ group-norm constraint needs to be used. We show that this algorithm can recover any arbitrary discrete pairwise graphical model, and also characterize its sample complexity as a function of model width, alphabet size, edge parameter accuracy, and the number of variables. We show that along every one of these axes, it matches or improves on all existing results and algorithms for this problem. Our analysis applies a sharp generalization error bound for logistic regression when the weight vector has an $\ell_1$ (or $\ell_{2,1}$) constraint and the sample vector has an $\ell_{\infty}$ (or $\ell_{2, \infty}$) constraint.

algorithm, discrete pairwise graphical model, sparse logistic regression learn, (4 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.89)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Add feedback

List-decodable Linear Regression

Karmalkar, Sushrut, Klivans, Adam, Kothari, Pravesh

Neural Information Processing SystemsMar-18-2020, 23:32:39 GMT

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. It outputs a list L of size O(1/\alpha) - a fixed constant - that contains an \ell that is close to \ell *. Our algorithm succeeds whenever the inliers are chosen from a certifiably anti-concentrated distribution D. In particular, this gives a (d/\alpha) {O(1/\alpha 8)} time algorithm to find a O(1/\alpha) size list when the inlier distribution is a standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that \ell * has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary.

algorithm, equation, list-decodable linear regression, (1 more...)

Neural Information Processing Systems

Genre: Research Report (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

Iterative Least Trimmed Squares for Mixed Linear Regression

Shen, Yanyao, Sanghavi, Sujay

Neural Information Processing SystemsMar-18-2020, 23:01:34 GMT

Given a linear regression setting, Iterative Least Trimmed Squares (ILTS) involves alternating between (a) selecting the subset of samples with lowest current loss, and (b) re-fitting the linear model only on that subset. Both steps are very fast and simple. In this paper, we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C). We first establish deterministic conditions (on the features etc.) under which the ILTS iterate converges linearly to the closest mixture component. We also provide a global algorithm that uses ILTS as a subroutine, to fully solve mixed linear regressions with corruptions.

linear regression, mixed linear regression, trimmed square, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

A First-Order Algorithmic Framework for Distributionally Robust Logistic Regression

LI, JIAJIN, HUANG, SEN, So, Anthony Man-Cho

Neural Information Processing SystemsMar-18-2020, 22:02:07 GMT

Wasserstein distance-based distributionally robust optimization (DRO) has received much attention lately due to its ability to provide a robustness interpretation of various learning models. Moreover, many of the DRO problems that arise in the learning context admits exact convex reformulations and hence can be tackled by off-the-shelf solvers. Nevertheless, the use of such solvers severely limits the applicability of DRO in large-scale learning problems, as they often rely on general purpose interior-point algorithms. On the other hand, there are very few works that attempt to develop fast iterative methods to solve these DRO problems, which typically possess complicated structures. In this paper, we take a first step towards resolving the above difficulty by developing a first-order algorithmic framework for tackling a class of Wasserstein distance-based distributionally robust logistic regression (DRLR) problem. Specifically, we propose a novel linearized proximal ADMM to solve the DRLR problem, whose objective is convex but consists of a smooth term plus two non-separable non-smooth terms.

distributionally robust logistic regression, first-order algorithmic framework, wasserstein distance-based distributionally, (2 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.64)
Research Report > Experimental Study (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.64)

Add feedback

First order expansion of convex regularized estimators

Bellec, Pierre, Kuchibhotla, Arun

Neural Information Processing SystemsMar-18-2020, 21:48:16 GMT

We consider first order expansions of convex penalized estimators in high-dimensional regression problems with random designs. Our setting includes linear regression and logistic regression as special cases. For a given penalty function $h$ and the corresponding penalized estimator $\hbeta$, we construct a quantity $\eta$, the first order expansion of $\hbeta$, such that the distance between $\hbeta$ and $\eta$ is an order of magnitude smaller than the estimation error $\ \hat{\beta} - \beta *\ $. In this sense, the first order expansion $\eta$ can be thought of as a generalization of influence functions from the mathematical statistics literature to regularized estimators in high-dimensions. Such first order expansion implies that the risk of $\hat{\beta}$ is asymptotically the same as the risk of $\eta$ which leads to a precise characterization of the MSE of $\hbeta$; this characterization takes a particularly simple form for isotropic design.

estimator, order expansion, regularized estimator, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)

Add feedback

Partitioning Structure Learning for Segmented Linear Regression Trees

Zheng, Xiangyu, Chen, Song Xi

Neural Information Processing SystemsMar-18-2020, 21:16:51 GMT

This paper proposes a partitioning structure learning method for segmented linear regression trees (SLRT), which assigns linear predictors over the terminal nodes. The recursive partitioning process is driven by an adaptive split selection algorithm that maximizes, at each node, a criterion function based on a conditional Kendall's τ statistic that measures the rank dependence between the regressors and the fit- ted linear residuals. Theoretical analysis shows that the split selection algorithm permits consistent identification and estimation of the unknown segments. A suffi- ciently large tree is induced by applying the split selection algorithm recursively. Then the minimal cost-complexity tree pruning procedure is applied to attain the right-sized tree, that ensures (i) the nested structure of pruned subtrees and (ii) consistent estimation to the number of segments.

partitioning structure learning, predictor, segmented linear regression tree, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.65)

Add feedback

Fast Sparse Group Lasso

Ida, Yasutoshi, Fujiwara, Yasuhiro, Kashima, Hisashi

Neural Information Processing SystemsMar-18-2020, 21:02:54 GMT

Sparse Group Lasso is a method of linear regression analysis that finds sparse parameters in terms of both feature groups and individual features. Block Coordinate Descent is a standard approach to obtain the parameters of Sparse Group Lasso, and iteratively updates the parameters for each parameter group. However, as an update of only one parameter group depends on all the parameter groups or data points, the computation cost is high when the number of the parameters or data points is large. This paper proposes a fast Block Coordinate Descent for Sparse Group Lasso. It efficiently skips the updates of the groups whose parameters must be zeros by using the parameters in one group. In addition, it preferentially updates parameters in a candidate group set, which contains groups whose parameters must not be zeros.

block coordinate descent, fast sparse group lasso, sparse group lasso, (1 more...)

Neural Information Processing Systems

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.64)

Add feedback

Differentially Private Bayesian Linear Regression

Bernstein, Garrett, Sheldon, Daniel R.

Neural Information Processing SystemsMar-18-2020, 20:31:53 GMT

Linear regression is an important tool across many fields that work with sensitive human-sourced data. Significant prior work has focused on producing differentially private point estimates, which provide a privacy guarantee to individuals while still allowing modelers to draw insights from data by estimating regression coefficients. We investigate the problem of Bayesian linear regression, with the goal of computing posterior distributions that correctly quantify uncertainty given privately released statistics. We show that a naive approach that ignores the noise injected by the privacy mechanism does a poor job in realistic data settings. We then develop noise-aware methods that perform inference over the privacy mechanism and produce correct posteriors across a wide range of scenarios.

differentially private bayesian linear regression, privacy mechanism

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Predicting Performance of Asynchronous Differentially-Private Learning

Farokhi, Farhad, Kaafar, Mohamed Ali

arXiv.org Machine LearningMar-18-2020

We consider training machine learning models using Training data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. In this paper, we develop differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We prove that we can forecast the performance of the proposed privacy-preserving asynchronous algorithms. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher.

data owner, dataset, ml model, (15 more...)

arXiv.org Machine Learning

2003.085

Country:

Oceania > Australia (0.04)
North America > United States > New York (0.04)
South America > Paraguay > Asunción > Asunción (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback