Goto

Collaborating Authors

 Regression


Intro to Amazon Machine Learning with Logistic Regression – BMC Blogs

#artificialintelligence

Here we look at Amazon's Machine Learning cloud service. In this first article we will look at logistic regression. In future blog posts we will see what other algorithms it offers. Remember that logistic regression is similar to linear regression. It looks at a series of independent variables and calculates one dependant variable.


Gigaom How Machines Learn: The Top Four Approaches to ML in Business

#artificialintelligence

Supervised learning outputs typically have one of two forms. Regression outputs are real-valued numbers that exist in a continuous space. For instance, many of Vidora's eCommerce customers want to forecast how much money each customer is likely to spend, so that high-value customer may be targeted with personalized promotional offers. A simple linear regression structures this problem through the familiar formula y mx b, where y is predicted expenditure and x is some attribute of each customer -- say, number of site visits. During training, we supply labeled input-output pairs -- i.e. customers for which transaction history is already known -- and the algorithm finds the optimal parameters m and b to make this relationship as accurate as possible.


Efficient Discovery of Heterogeneous Treatment Effects in Randomized Experiments via Anomalous Pattern Detection

arXiv.org Machine Learning

The randomized experiment is an important tool for inferring the causal impact of an intervention. The recent literature on statistical learning methods for heterogeneous treatment effects demonstrates the utility of estimating the marginal conditional average treatment effect (MCATE), i.e., the average treatment effect for a subpopulation of respondents who share a particular subset of covariates. However, each proposed method makes its own set of restrictive assumptions about the intervention's effects, the underlying data generating processes, and which subpopulations (MCATEs) to explicitly estimate. Moreover, the majority of the literature provides no mechanism to identify which subpopulations are the most affected--beyond manual inspection--and provides little guarantee on the correctness of the identified subpopulations. Therefore, we propose Treatment Effect Subset Scan (TESS), a new method for discovering which subpopulation in a randomized experiment is most significantly affected by a treatment. We frame this challenge as a pattern detection problem where we maximize a nonparametric scan statistic (measurement of distributional divergence) over subpopulations, while being parsimonious in which specific subpopulations to evaluate. Furthermore, we identify the subpopulation which experiences the largest distributional change as a result of the intervention, while making minimal assumptions about the intervention's effects or the underlying data generating process. In addition to the algorithm, we demonstrate that the asymptotic Type I and II error can be controlled, and provide sufficient conditions for detection consistency---i.e., exact identification of the affected subpopulation. Finally, we validate the efficacy of the method by discovering heterogeneous treatment effects in simulations and in real-world data from a well-known program evaluation study.


Algorithm Linear Regression Example as House Prediction System

#artificialintelligence

Linear regression algorithm is used to predict the continuous valued output from a labeled training set i.e. it is a supervised learning algorithm. It is a well-known algorithm for machine learning as well as it is well-known in Statistics. If you are new to the machine learning, do read introduction to machine learning for beginners. We are having a training set of a House prediction system. We are using features of House i.e. area, the number of bedrooms, etc. as input and its price is our output.


Intro to Machine Learning with Apache Spark and Apache Zeppelin - Hortonworks

#artificialintelligence

In this tutorial, we will introduce you to Machine Learning with Apache Spark. The hands-on lab for this tutorial is an Apache Zeppelin notebook that has all the steps necessary to ingest and explore data, train, test, visualize, and save a model. We will cover a basic Linear Regression model that will allow us perform simple predictions on a sample data. This model can be further expanded and modified to fit your needs. Most importantly, by the end of this tutorial, you will understand how to create an end-to-end pipeline for setting up and training simple models in Spark.


Logistic Regression -- Detailed Overview – Towards Data Science

#artificialintelligence

Logistic Regression was used in the biological sciences in early twentieth century. It was then used in many social science applications. Logistic Regression is used when the dependent variable(target) is categorical. Consider a scenario where we need to classify whether an email is spam or not. If we use linear regression for this problem, there is a need for setting up a threshold based on which classification can be done.


Learning through deterministic assignment of hidden parameters

arXiv.org Machine Learning

Supervised learning frequently boils down to determining hidden and bright parameters in a parameterized hypothesis space based on finite input-output samples. The hidden parameters determine the attributions of hidden predictors or the nonlinear mechanism of an estimator, while the bright parameters characterize how hidden predictors are linearly combined or the linear mechanism. In traditional learning paradigm, hidden and bright parameters are not distinguished and trained simultaneously in one learning process. Such an one-stage learning (OSL) brings a benefit of theoretical analysis but suffers from the high computational burden. To overcome this difficulty, a two-stage learning (TSL) scheme, featured by learning through deterministic assignment of hidden parameters (LtDaHP) was proposed, which suggests to deterministically generate the hidden parameters by using minimal Riesz energy points on a sphere and equally spaced points in an interval. We theoretically show that with such deterministic assignment of hidden parameters, LtDaHP with a neural network realization almost shares the same generalization performance with that of OSL. We also present a series of simulations and application examples to support the outperformance of LtDaHP


Risk and parameter convergence of logistic regression

arXiv.org Machine Learning

The logistic loss is strictly convex and does not attain its infimum; consequently the solutions of logistic regression are in general off at infinity. This work provides a convergence analysis of gradient descent applied to logistic regression under no assumptions on the problem instance. Firstly, the risk is shown to converge at a rate $\mathcal{O}(\ln(t)^2/t)$. Secondly, the parameter convergence is characterized along a unique pair of complementary subspaces defined by the problem instance: one subspace along which strong convexity induces parameters to converge at rate $\mathcal{O}(\ln(t)^2/\sqrt{t})$, and its orthogonal complement along which separability induces parameters to converge in direction at rate $\mathcal{O}(\ln\ln(t) / \ln(t))$.


Confounder Detection in High Dimensional Linear Models using First Moments of Spectral Measures

arXiv.org Machine Learning

In this paper, we study the confounder detection problem in the linear model, where the target variable $Y$ is predicted using its $n$ potential causes $X_n=(x_1,...,x_n)^T$. Based on an assumption of rotation invariant generating process of the model, recent study shows that the spectral measure induced by the regression coefficient vector with respect to the covariance matrix of $X_n$ is close to a uniform measure in purely causal cases, but it differs from a uniform measure characteristically in the presence of a scalar confounder. Then, analyzing spectral measure pattern could help to detect confounding. In this paper, we propose to use the first moment of the spectral measure for confounder detection. We calculate the first moment of the regression vector induced spectral measure, and compare it with the first moment of a uniform spectral measure, both defined with respect to the covariance matrix of $X_n$. The two moments coincide in non-confounding cases, and differ from each other in the presence of confounding. This statistical causal-confounding asymmetry can be used for confounder detection. Without the need of analyzing the spectral measure pattern, our method does avoid the difficulty of metric choice and multiple parameter optimization. Experiments on synthetic and real data show the performance of this method.


Graph-based regularization for regression problems with highly-correlated designs

arXiv.org Machine Learning

Sparse models for high-dimensional linear regression and machine learning have received substantial attention over the past two decades. Model selection, or determining which features or covariates are the best explanatory variables, is critical to the interpretability of a learned model. Much of the current literature assumes that covariates are only mildly correlated. However, in modern applications ranging from functional MRI to genome-wide association studies, covariates are highly correlated and do not exhibit key properties (such as the restricted eigenvalue condition, RIP, or other related assumptions). This paper considers a high-dimensional regression setting in which a graph governs both correlations among the covariates and the similarity among regression coefficients. Using side information about the strength of correlations among features, we form a graph with edge weights corresponding to pairwise covariances. This graph is used to define a graph total variation regularizer that promotes similar weights for highly correlated features. The graph structure encapsulated by this regularizer helps precondition correlated features to yield provably accurate estimates. Using graph-based regularizers to develop theoretical guarantees for highly-correlated covariates has not been previously examined. This paper shows how our proposed graph-based regularization yields mean-squared error guarantees for a broad range of covariance graph structures and correlation strengths which in many cases are optimal by imposing additional structure on $\beta^{\star}$ which encourages \emph{alignment} with the covariance graph. Our proposed approach outperforms other state-of-the-art methods for highly-correlated design in a variety of experiments on simulated and real fMRI data.