Goto

Collaborating Authors

 Regression


Treatment Effect Estimation with Data-Driven Variable Decomposition

AAAI Conferences

One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Control for confounding effect is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of the estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in big data era. In this paper, we propose a Data-Driven Variable Decomposition (D$^2$VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we show experimentally that the proposed D$^2$VD algorithm can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods on both synthetic data and real online advertising dataset.


Webinar: Improve Your Regression with CART and Gradient Boosting

@machinelearnbot

In this webinar we'll introduce you to a powerful tree-based machine learning algorithm called gradient boosting. Gradient boosting often outperforms linear regression, Random Forests, and CART. Boosted trees automatically handle variable selection, variable interactions, nonlinear relationships, outliers, and missing values. We'll see that CART decision trees are the foundation of gradient boosting and discuss some of the advantages of boosting versus a Random Forest. We will explore the gradient boosting algorithm and discuss the most important modeling parameters like the learning rate, number of terminal nodes, number of trees, loss functions, and more.


Intercomparison of Machine Learning Methods for Statistical Downscaling: The Case of Daily and Extreme Precipitation

arXiv.org Machine Learning

Statistical downscaling of global climate models (GCMs) allows researchers to study local climate change effects decades into the future. A wide range of statistical models have been applied to downscaling GCMs but recent advances in machine learning have not been explored. In this paper, we compare four fundamental statistical methods, Bias Correction Spatial Disaggregation (BCSD), Ordinary Least Squares, Elastic-Net, and Support Vector Machine, with three more advanced machine learning methods, Multi-task Sparse Structure Learning (MSSL), BCSD coupled with MSSL, and Convolutional Neural Networks to downscale daily precipitation in the Northeast United States. Metrics to evaluate of each method's ability to capture daily anomalies, large scale climate shifts, and extremes are analyzed. We find that linear methods, led by BCSD, consistently outperform non-linear approaches. The direct application of state-of-the-art machine learning methods to statistical downscaling does not provide improvements over simpler, longstanding approaches.


metboost: Exploratory regression analysis with hierarchically clustered data

arXiv.org Machine Learning

As data collections become larger, exploratory regression analysis becomes more important but more challenging. When observations are hierarchically clustered the problem is even more challenging because model selection with mixed effect models can produce misleading results when nonlinear effects are not included into the model (Bauer and Cai, 2009). A machine learning method called boosted decision trees (Friedman, 2001) is a good approach for exploratory regression analysis in real data sets because it can detect predictors with nonlinear and interaction effects while also accounting for missing data. We propose an extension to boosted decision decision trees called metboost for hierarchically clustered data. It works by constraining the structure of each tree to be the same across groups, but allowing the terminal node means to differ. This allows predictors and split points to lead to different predictions within each group, and approximates nonlinear group specific effects. Importantly, metboost remains computationally feasible for thousands of observations and hundreds of predictors that may contain missing values. We apply the method to predict math performance for 15,240 students from 751 schools in data collected in the Educational Longitudinal Study 2002 (Ingels et al., 2007), allowing 76 predictors to have unique effects for each school. When comparing results to boosted decision trees, metboost has 15% improved prediction performance. Results of a large simulation study show that metboost has up to 70% improved variable selection performance and up to 30% improved prediction performance compared to boosted decision trees when group sizes are small


What is Regression Analysis?

@machinelearnbot

Guest blog by Kevin Gray.. Kevin is president of Cannon Gray, a marketing science and analytics consultancy. Regression is arguably the workhorse of statistics. Despite its popularity, however, it may also be the most misunderstood. The answer might surprise you: There is no such thing as Regression. The Dependent Variable is something you want to predict or explain.


r2VIM: A new variable selection method for random forests in genome-wide association studies

#artificialintelligence

In the last few years, more than one thousand single-nucleotide polymorphisms (SNPs) have been reproducibly associated with more than two hundred phenotypes and quantitative traits in genome-wide association studies (GWAS) [1]. These loci are usually identified by linear or logistic regression analysis which is performed separately for each SNP. The resulting p-values are then used to rank the SNPs and to select those with a p-value smaller than a pre-specified significance level which is adjusted for the large number of statistical tests. In such a scenario, comparable to analyses of other genomic data sets such as gene expression, p-values are not used in a confirmatory setting but rather as a screening tool to identify associated, i.e. important, SNPs while controlling the number of false positive findings. Nonparametric, model-free statistical learning machines provide a promising alternative to classical, model-based statistical methods for the selection of important variables in high dimensional data sets.


Getting Started with Tensorflow

#artificialintelligence

It has been almost a year since Tensorflow was released by Google.Although there are a lot of deep learning libraries available(like Theano etc.) but Tensorflow is pretty big!One of the prominent reason is being backed by the big fish,Google! Also tensorflow has pretty great support for distributed systems.Considering the open-source popularity of tensorflow and recent advancements in neural network research,this library is here to stay. In this post we will not only introduce tensorflow but also take a under-the-hood trip to its working.We will start off by going through basics of using tensorflow and analyze "computational graphs" that form the basis of tensorflow's working.Later we will build a linear regression model that would further clarify its working. When we come across the name "Tensorflow",the first thing that invariably comes to mind is the word "tensor".Why "tensor"flow?What is a "tensor"?Well,not dwelling too much on its mathematical representation,consider tensor as a multidimensional array of numbers.Thus all scalars,vectors,matrices fall under the category of tensors. In above program the function tf.constant(value) is used to declare a constant of value value and tf.add(a,b) is used to add two tensors a and b.


Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory

arXiv.org Machine Learning

The pathwise coordinate optimization is one of the most important computational frameworks for high dimensional convex and nonconvex sparse learning problems. It differs from the classical coordinate optimization algorithms in three salient features: {\it warm start initialization}, {\it active set updating}, and {\it strong rule for coordinate preselection}. Such a complex algorithmic structure grants superior empirical performance, but also poses significant challenge to theoretical analysis. To tackle this long lasting problem, we develop a new theory showing that these three features play pivotal roles in guaranteeing the outstanding statistical and computational performance of the pathwise coordinate optimization framework. Particularly, we analyze the existing pathwise coordinate optimization algorithms and provide new theoretical insights into them. The obtained insights further motivate the development of several modifications to improve the pathwise coordinate optimization framework, which guarantees linear convergence to a unique sparse local optimum with optimal statistical properties in parameter estimation and support recovery. This is the first result on the computational and statistical guarantees of the pathwise coordinate optimization framework in high dimensions. Thorough numerical experiments are provided to support our theory.


How to Make Manual Predictions for ARIMA Models with Python - Machine Learning Mastery

#artificialintelligence

The autoregression integrated moving average model or ARIMA model can seem intimidating to beginners. A good way to pull back the curtain in the method is to to use a trained model to make predictions manually. This demonstrates that ARIMA is a linear regression model at its core. Making manual predictions with a fit ARIMA models may also be a requirement in your project, meaning that you can save the coefficients from the fit model and use them as configuration in your own code to make predictions without the need for heavy Python libraries in a production environment. In this tutorial, you will discover how to make manual predictions with a trained ARIMA model in Python.


Improve Your Regression with CART and Gradient Boosting

#artificialintelligence

We'll see that CART decision trees are the foundation of gradient boosting and discuss some of the advantages of boosting versus a Random Forest. We will explore the gradient boosting algorithm and discuss the most important modeling parameters like the learning rate, number of terminal nodes, number of trees, loss functions, and more. We will demonstrate using an implementation of gradient boosting (TreeNet Software) to fit the model and compare the performance to a linear regression model, a CART tree, and a Random Forest.