Goto

Collaborating Authors

 Regression


The Perfect Recipe for Classification Using Logistic Regression

#artificialintelligence

Supervised Learning is an essential part of Machine Learning. Classification techniques are used when the variable to be predicted is categorical. A common example of classification problem is trying to classify an Iris flower among its three different species. Logistic regression is a classification technique borrowed by machine learning from the field of statistics. Logistic Regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome.


Data Science & Deep Learning for Business 20 Case Studies

#artificialintelligence

All MBA's will preach that Data-Driven Methods udemy discount Understand the value of data for businesses Learn to use Python, Pandas, Matplotlib & Seaborn, SkLearn, Keras, Tensorflow, NLTK, Prophet, PySpark, MLLib and more! Apply Data Science in Marketing to improve Conversion Rates, Predict Engagement and Customer Life Time Value Machine Learning from Linear Regressions (polynomial & multivariate), K-NNs, Logistic Regressions, SVMs, Decision Trees & Random Forests Unsupervised Machine Learning with K-Means, Mean-Shift, DBSCAN, EM with GMMs, PCA and t-SNE Build a Product Recommendation Tool using collaborative & item/content based Hypothesis Testing and A/B Testing - Understand t-tests and p values Natural Langauge Processing - Summarize Reviews, Sentiment Analysis on Airline Tweets & Spam Detection To use Google Colab's iPython notebooks for fast, relaible cloud based data science work Deploy your Machine Learning Models on the cloud using AWS This course takes on Machine Learning and Statistical theory and teaches you to use it in solving 20 real-world Business problems. Data Scientist is the buzz of the 21st century for good reason! The tech revolution is just starting and Data Science is at the forefront. As a result, "Data Scientist has become the top job in the US for the last 4 years running!" according to Harvard Business Review & Glassdoor.


Logistic Regression for Beginners - A Complete Guide - Let's Discuss Stuff

#artificialintelligence

Logistic Regression is the most widely used classification algorithm in machine learning. It is used in many real-world scenarios like spam detected, cancer detection, IRIS dataset, etc. Mostly it is used in binary classification problems. But it can also be used in multiclass classification. Logistic Regression predicts the probability that the given data point belongs to a certain class or not. In this article, I will be using the famous heart disease dataset from Kaggle. In this dataset, the main goal is to predict whether the given person has heart disease or not.


Conformal prediction interval for dynamic time-series

arXiv.org Machine Learning

We develop a method to build distribution-free prediction intervals in batches for time-series based on conformal inference, called \Verb|EnbPI| that wraps around any ensemble estimator to construct sequential prediction intervals. \Verb|EnbPI| is closely related to the conformal prediction (CP) framework but does not require data exchangeability. Theoretically, these intervals attain finite-sample, approximately valid average coverage for broad classes of regression functions and time-series with strongly mixing stochastic errors. Computationally, \Verb|EnbPI| requires no training of multiple ensemble estimators; it efficiently operates around an already trained ensemble estimator. In general, \Verb|EnbPI| is easy to implement, scalable to producing arbitrarily many prediction intervals sequentially, and well-suited to a wide range of regression functions. We perform extensive simulations and real-data analyses to demonstrate its effectiveness.


Transferable Calibration with Lower Bias and Variance in Domain Adaptation

arXiv.org Machine Learning

Domain Adaptation (DA) enables transferring a learning machine from a labeled source domain to an unlabeled target one. While remarkable advances have been made, most of the existing DA methods focus on improving the target accuracy at inference. How to estimate the predictive uncertainty of DA models is vital for decision-making in safety-critical scenarios but remains the boundary to explore. In this paper, we delve into the open problem of Calibration in DA, which is extremely challenging due to the coexistence of domain shift and the lack of target labels. We first reveal the dilemma that DA models learn higher accuracy at the expense of well-calibrated probabilities. Driven by this finding, we propose Transferable Calibration (TransCal) to achieve more accurate calibration with lower bias and variance in a unified hyperparameter-free optimization framework. As a general post-hoc calibration method, TransCal can be easily applied to recalibrate existing DA methods. Its efficacy has been justified both theoretically and empirically.


Causality-aware counterfactual confounding adjustment as an alternative to linear residualization in anticausal prediction tasks based on linear learners

arXiv.org Artificial Intelligence

Linear residualization is a common practice for confounding adjustment in machine learning (ML) applications. Recently, causality-aware predictive modeling has been proposed as an alternative causality-inspired approach for adjusting for confounders. The basic idea is to simulate counterfactual data that is free from the spurious associations generated by the observed confounders. In this paper, we compare the linear residualization approach against the causality-aware confounding adjustment in anticausal prediction tasks, and show that the causality-aware approach tends to (asymptotically) outperform the residualization adjustment in terms of predictive performance in linear learners. Importantly, our results still holds even when the true model is not linear. We illustrate our results in both regression and classification tasks, where we compared the causality-aware and residualization approaches using mean squared errors and classification accuracy in synthetic data experiments where the linear regression model is mispecified, as well as, when the linear model is correctly specified. Furthermore, we illustrate how the causality-aware approach is more stable than residualization with respect to dataset shifts in the joint distribution of the confounders and outcome variables.


Filling Missing Wind Speed Data Using Various Regression Technique

#artificialintelligence

Missing data is very common when you do collect data. But it will be a problem when you in the data analysis phase. A common and the best practice at least for me is just ignoring the missing data. Because no matter how good your method to fill the missing pieces, there is always an error introduced by the method. And then the filler data can't be a missing piece of pattern in the data.


Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

arXiv.org Machine Learning

This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient. We first consider the off-policy policy evaluation problem. To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension. To reduce the Lasso bias, we further propose a post model-selection estimator that applies fitted Q-evaluation to the features selected via group Lasso. Under an additional signal strength assumption, we derive a sharper instance-dependent error bound that depends on a divergence function measuring the distribution mismatch between the data distribution and occupancy measure of the target policy. Further, we study the Lasso fitted Q-iteration for batch policy optimization and establish a finite-sample error bound depending on the ratio between the number of relevant features and restricted minimal eigenvalue of the data's covariance. In the end, we complement the results with minimax lower bounds for batch-data policy evaluation/optimization that nearly match our upper bounds. The results suggest that having well-conditioned data is crucial for sparse batch policy learning.


Do We Exploit all Information for Counterfactual Analysis? Benefits of Factor Models and Idiosyncratic Correction

arXiv.org Machine Learning

The measurement of treatment (intervention) effects on a single (or just a few) treated unit(s) based on counterfactuals constructed from artificial controls has become a popular practice in applied statistics and economics since the proposal of the synthetic control method. In high-dimensional setting, we often use principal component or (weakly) sparse regression to estimate counterfactuals. Do we use enough data information? To better estimate the effects of price changes on the sales in our case study, we propose a general framework on counterfactual analysis for high dimensional dependent data. The framework includes both principal component regression and sparse linear regression as specific cases. It uses both factor and idiosyncratic components as predictors for improved counterfactual analysis, resulting a method called Factor-Adjusted Regularized Method for Treatment (FarmTreat) evaluation. We demonstrate convincingly that using either factors or sparse regression is inadequate for counterfactual analysis in many applications and the case for information gain can be made through the use of idiosyncratic components. We also develop theory and methods to formally answer the question if common factors are adequate for estimating counterfactuals. Furthermore, we consider a simple resampling approach to conduct inference on the treatment effect as well as bootstrap test to access the relevance of the idiosyncratic components. We apply the proposed method to evaluate the effects of price changes on the sales of a set of products based on a novel large panel of sale data from a major retail chain in Brazil and demonstrate the benefits of using additional idiosyncratic components in the treatment effect evaluations.


Deconvoluting Kernel Density Estimation and Regression for Locally Differentially Private Data

arXiv.org Machine Learning

Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.