Goto

Collaborating Authors

 Regression


Wasserstein-based fairness interpretability framework for machine learning models

arXiv.org Artificial Intelligence

The objective of this article is to introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models at the level of a distribution. In our work, we measure the model bias across sub-population distributions in the model output using the Wasserstein metric. To properly quantify the contributions of predictors, we take into account the favorability of both the model and predictors with respect to the non-protected class. The quantification is accomplished by the use of transport theory, which gives rise to the decomposition of the model bias and bias explanations to positive and negative contributions. To gain more insight into the role of favorability and allow for additivity of bias explanations, we adapt techniques from cooperative game theory.


Linear Regression : decoded

#artificialintelligence

Everyone wants to try their hands on Machine Learning at some point of time in their software career. The first algorithm mostly all books and online courses starts with is the Linear regression. Linear arranged in a straight line. So, the idea of understanding the relationship between 2 variables by plotting a linear line is coined as linear regression. Let us take an example, Price of the house with respect to the size of the house.


Deep Learning Prerequisites: Linear Regression in Python

#artificialintelligence

This course teaches you about one popular technique used in machine learning, data science and statistics: linear regression. This course teaches you about one popular technique used in machine learning, data science and statistics: linear regression. We cover the theory from the ground up: derivation of the solution, and applications to real-world problems. We show you how one might code their own linear regression module in Python. Linear regression is the simplest machine learning model you can learn, yet there is so much depth that you'll be returning to it for years to come.


Creating Regression Models to Predict Data Responses

#artificialintelligence

Before we look at any code, we should understand a little about the math behind a regression model. As mentioned, regression models can have multiple input variables, or features, but for this article, we will use a single feature for simplicity. Regression analysis involves making a guess at what type of function would fit your dataset the best, whether that be a line, an nth degree polynomial, a logarithmic function, etc. Regression models assume the dataset follows this form: Here, x and y are our feature and response at observation i, and e is an error term. The goal of the regression model is to estimate the function, f, so that it most closely fits the dataset (neglecting the error term). The function, f, is the guess we make about what type of function would best fit our dataset.


Distributional Hardness Against Preconditioned Lasso via Erasure-Robust Designs

arXiv.org Machine Learning

Sparse linear regression with ill-conditioned Gaussian random designs is widely believed to exhibit a statistical/computational gap, but there is surprisingly little formal evidence for this belief, even in the form of examples that are hard for restricted classes of algorithms. Recent work has shown that, for certain covariance matrices, the broad class of Preconditioned Lasso programs provably cannot succeed on polylogarithmically sparse signals with a sublinear number of samples. However, this lower bound only shows that for every preconditioner, there exists at least one signal that it fails to recover successfully. This leaves open the possibility that, for example, trying multiple different preconditioners solves every sparse linear regression problem. In this work, we prove a stronger lower bound that overcomes this issue. For an appropriate covariance matrix, we construct a single signal distribution on which any invertibly-preconditioned Lasso program fails with high probability, unless it receives a linear number of samples. Surprisingly, at the heart of our lower bound is a new positive result in compressed sensing. We show that standard sparse random designs are with high probability robust to adversarial measurement erasures, in the sense that if $b$ measurements are erased, then all but $O(b)$ of the coordinates of the signal are still information-theoretically identifiable. To our knowledge, this is the first time that partial recoverability of arbitrary sparse signals under erasures has been studied in compressed sensing.


Applied Machine Learning in R

#artificialintelligence

They are powerful data mining techniques that allow you to detect patterns in your data or variables. For each technique, a number of practical exercises are proposed. By doing these exercises you'll actually apply in practice what you have learned. This course is your opportunity to become a machine learning expert in a few weeks only! With my video lectures, you will find it very easy to master the major machine learning techniques. Everything is shown live, step by step, so you can replicate any procedure at any time you need it. So click the "Enroll" button to get instant access to your machine learning course. It will surely provide you with new priceless skills. And, who knows, it could give you a tremendous career boost in the near future.


Adaptive Semi-Supervised Inference for Optimal Treatment Decisions with Electronic Medical Record Data

arXiv.org Machine Learning

A treatment regime is a rule that assigns a treatment to patients based on their covariate information. Recently, estimation of the optimal treatment regime that yields the greatest overall expected clinical outcome of interest has attracted a lot of attention. In this work, we consider estimation of the optimal treatment regime with electronic medical record data under a semi-supervised setting. Here, data consist of two parts: a set of `labeled' patients for whom we have the covariate, treatment and outcome information, and a much larger set of `unlabeled' patients for whom we only have the covariate information. We proposes an imputation-based semi-supervised method, utilizing `unlabeled' individuals to obtain a more efficient estimator of the optimal treatment regime. The asymptotic properties of the proposed estimators and their associated inference procedure are provided. Simulation studies are conducted to assess the empirical performance of the proposed method and to compare with a fully supervised method using only the labeled data. An application to an electronic medical record data set on the treatment of hypotensive episodes during intensive care unit (ICU) stays is also given for further illustration.


Logistic Regression for Binary Classification: Hands-On with SciKit-Learn

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.


3 Evaluation Metrics for Regression

#artificialintelligence

Regression-based machine learning models are used to predict the value of a continuous attribute. As with all supervised machine learning problems the model is trained using a set of features (X) to learn the mapping to a target variable (y). In the case of regression, the target is a continuous variable such as the price of a house. Probably the simplest regression algorithm is linear regression. Simple linear regression, where there is only one feature and one target, is represented by the equation shown below.


Are Latent Factor Regression and Sparse Regression Adequate?

arXiv.org Machine Learning

We propose the Factor Augmented sparse linear Regression Model (FARM) that not only encompasses both the latent factor regression and sparse linear regression as special cases but also bridges dimension reduction and sparse regression together. We provide theoretical guarantees for the estimation of our model under the existence of sub-Gaussian and heavy-tailed noises (with bounded (1+x)-th moment, for all x>0), respectively. In addition, the existing works on supervised learning often assume the latent factor regression or the sparse linear regression is the true underlying model without justifying its adequacy. To fill in such an important gap, we also leverage our model as the alternative model to test the sufficiency of the latent factor regression and the sparse linear regression models. To accomplish these goals, we propose the Factor-Adjusted de-Biased Test (FabTest) and a two-stage ANOVA type test respectively. We also conduct large-scale numerical experiments including both synthetic and FRED macroeconomics data to corroborate the theoretical properties of our methods. Numerical results illustrate the robustness and effectiveness of our model against latent factor regression and sparse linear regression models.