Regression
PrivLogit: Efficient Privacy-preserving Logistic Regression by Tailoring Numerical Optimizers
Xie, Wei, Wang, Yang, Boker, Steven M., Brown, Donald E.
Safeguarding privacy in machine learning is highly desirable, especially in collaborative studies across many organizations. Privacy-preserving distributed machine learning (based on cryptography) is popular to solve the problem. However, existing cryptographic protocols still incur excess computational overhead. Here, we make a novel observation that this is partially due to naive adoption of mainstream numerical optimization (e.g., Newton method) and failing to tailor for secure computing. This work presents a contrasting perspective: customizing numerical optimization specifically for secure settings. We propose a seemingly less-favorable optimization method that can in fact significantly accelerate privacy-preserving logistic regression. Leveraging this new method, we propose two new secure protocols for conducting logistic regression in a privacy-preserving and distributed manner. Extensive theoretical and empirical evaluations prove the competitive performance of our two secure proposals while without compromising accuracy or privacy: with speedup up to 2.3x and 8.1x, respectively, over state-of-the-art; and even faster as data scales up. Such drastic speedup is on top of and in addition to performance improvements from existing (and future) state-of-the-art cryptography. Our work provides a new way towards efficient and practical privacy-preserving logistic regression for large-scale studies which are common for modern science.
Excluding variables from a logistic regression model based on correlation
To start with, usually, the cases where Logistic Regression is performed is when the cases of interest are small in no ( 5%) - like in your case, small size of frauds. The intention is to identify patterns to be able to identify fraud before their fraud in future. Second, when you say variables are correlated, they also generally have a similar information in terms of business sense. For eg., Price variables & Discount are two correlated variables, yet different transformations of the same kind of data. So, it makes sense to keep just one! Similarly, in your case, it's best to isolate such cases.
Jackknife logistic and linear regression for clustering and predictions
This article discusses a far more general version of the technique described in our article The best kept secret about regression. Here we adapt our methodology so that it applies to data sets with a more complex structure, in particular with highly correlated independent variables. Our goal is to produce a regression tool that can be used as a black box, be very robust and parameter-free, and usable and easy-to-interpret by non-statisticians. It is part of a bigger project: automating many fundamental data science tasks, to make it easy, scalable and cheap for data consumers, not just for data experts. Readers are invited to further formalize the technology outlined here, and challenge my proposed methodology.
How to choose algorithms for Microsoft Azure Machine Learning
The answer to the question "What machine learning algorithm should I use?" is always "It depends." It depends on the size, quality, and nature of the data. It depends what you want to do with the answer. It depends on how the math of the algorithm was translated into instructions for the computer you are using. And it depends on how much time you have. Even the most experienced data scientists can't tell which algorithm will perform best before trying them. The Microsoft Azure Machine Learning Algorithm Cheat Sheet helps you choose the right machine learning algorithm for your predictive analytics solutions from the Microsoft Azure Machine Learning library of algorithms.
Operator-valued Kernels for Learning from Functional Response Data
Kadri, Hachem, Duflos, Emmanuel, Preux, Philippe, Canu, Stรฉphane, Rakotomamonjy, Alain, Audiffren, Julien
In this paper we consider the problems of supervised classification and regression in the case where attributes and labels are functions: a data is represented by a set of functions, and the label is also a function. We focus on the use of reproducing kernel Hilbert space theory to learn from such functional data. Basic concepts and properties of kernel-based learning are extended to include the estimation of function-valued functions. In this setting, the representer theorem is restated, a set of rigorously defined infinite-dimensional operator-valued kernels that can be valuably applied when the data are functions is described, and a learning algorithm for nonlinear functional data analysis is introduced. The methodology is illustrated through speech and audio signal processing experiments.
24 Uses of Statistical Modeling (Part I)
Here we discuss general applications of statistical models, whether they arise from data science, operations research, engineering, machine learning or statistics. We do not discuss specific algorithms such as decision trees, logistic regression, Bayesian modeling, Markov models, data reduction or feature selection. Instead, I discuss frameworks - each one using its own types of techniques and algorithms - to solve real life problems. Most of the entries below are found in Wikipedia, and I have used a few definitions or extracts from the relevant Wikipedia articles, in addition to personal contributions. Spatial dependency is the co-variation of properties within geographic space: characteristics at proximal locations appear to be correlated, either positively or negatively. Methods for time series analyses may be divided into two classes: frequency-domain methods and time-domain methods.
Algorithms for Fitting the Constrained Lasso
We compare alternative computing strategies for solving the constrained lasso problem. As its name suggests, the constrained lasso extends the widely-used lasso to handle linear constraints, which allow the user to incorporate prior information into the model. In addition to quadratic programming, we employ the alternating direction method of multipliers (ADMM) and also derive an efficient solution path algorithm. Through both simulations and real data examples, we compare the different algorithms and provide practical recommendations in terms of efficiency and accuracy for various sizes of data. We also show that, for an arbitrary penalty matrix, the generalized lasso can be transformed to a constrained lasso, while the converse is not true. Thus, our methods can also be used for estimating a generalized lasso, which has wide-ranging applications. Code for implementing the algorithms is freely available in the Matlab toolbox SparseReg.
Statistical Inference for Model Parameters in Stochastic Gradient Descent
Chen, Xi, Lee, Jason D., Tong, Xin T., Zhang, Yichen
The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing work focuses on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of the true model parameters based on SGD. To this end, we propose two consistent estimators of the asymptotic covariance of the average iterate from SGD: (1) an intuitive plug-in estimator and (2) a computationally more efficient batch-means estimator, which only uses the iterates from SGD. As the SGD process forms a time-inhomogeneous Markov chain, our batch-means estimator with carefully chosen increasing batch sizes generalizes the classical batch-means estimator designed for time-homogenous Markov chains. The proposed batch-means estimator is of independent interest, which can be potentially used for estimating the covariance of other time-inhomogeneous Markov chains. Both proposed estimators allow us to construct asymptotically exact confidence intervals and hypothesis tests. We further discuss an extension to conducting inference based on SGD for high-dimensional linear regression. Using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficient estimator and confidence intervals, which is computationally attractive and applicable to online data.
High-dimensional regression adjustments in randomized experiments
Wager, Stefan, Du, Wenfei, Taylor, Jonathan, Tibshirani, Robert
We study the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information, and show that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect. Our results considerably extend the range of settings where high-dimensional regression adjustments are guaranteed to provide valid inference about the population average treatment effect. We then propose cross-estimation, a simple method for obtaining finite-sample-unbiased treatment effect estimates that leverages high-dimensional regression adjustments. Our method can be used when the regression model is estimated using the lasso, the elastic net, subset selection, etc. Finally, we extend our analysis to allow for adaptive specification search via cross-validation, and flexible non-parametric regression adjustments with machine learning methods such as random forests or neural networks.
How To Implement Simple Linear Regression From Scratch With Python - Machine Learning Mastery
Linear regression is a prediction method that is more than 200 years old. Simple linear regression is a great first machine learning algorithm to implement as it requires you to estimate properties from your training dataset, but is simple enough for beginners to understand. In this tutorial, you will discover how to implement the simple linear regression algorithm from scratch in Python. How To Implement Simple Linear Regression From Scratch With Python Photo by Kamyar Adl, some rights reserved. This section is divided into two parts, a description of the simple linear regression technique and a description of the dataset to which we will later apply it.