Goto

Collaborating Authors

 Regression


Kernel Balancing: A flexible non-parametric weighting procedure for estimating causal effects

arXiv.org Machine Learning

In the absence of unobserved confounders, matching and weighting methods are widely used to estimate causal quantities including the Average Treatment Effect on the Treated (ATT). Unfortunately, these methods do not necessarily achieve their goal of making the multivariate distribution of covariates for the control group identical to that of the treated, leaving some (potentially multivariate) functions of the covariates with different means between the two groups. When these "imbalanced" functions influence the non-treatment potential outcome, the conditioning on observed covariates fails, and ATT estimates may be biased. Kernel balancing, introduced here, targets a weaker requirement for unbiased ATT estimation, specifically, that the expected non-treatment potential outcome for the treatment and control groups are equal. The conditional expectation of the non-treatment potential outcome is assumed to fall in the space of functions associated with a choice of kernel, implying a set of basis functions in which this regression surface is linear. Weights are then chosen on the control units such that the treated and control group have equal means on these basis functions. As a result, the expectation of the non-treatment potential outcome must also be equal for the treated and control groups after weighting, allowing unbiased ATT estimation by subsequent difference in means or an outcome model using these weights. Moreover, the weights produced are (1) precisely those that equalize a particular kernel-based approximation of the multivariate distribution of covariates for the treated and control, and (2) equivalent to a form of stabilized inverse propensity score weighting, though it does not require assuming any model of the treatment assignment mechanism. An R package, KBAL, is provided to implement this approach.


Adaptive Concentration of Regression Trees, with Application to Random Forests

arXiv.org Machine Learning

We study the convergence of the predictive surface of regression trees and forests. To support our analysis we introduce a notion of adaptive concentration for regression trees. This approach breaks tree training into a model selection phase in which we pick the tree splits, followed by a model fitting phase where we find the best regression model consistent with these splits. We then show that the fitted regression tree concentrates around the optimal predictor with the same splits: as d and n get large, the discrepancy is with high probability bounded on the order of sqrt(log(d) log(n)/k) uniformly over the whole regression surface, where d is the dimension of the feature space, n is the number of training examples, and k is the minimum leaf size for each tree. We also provide rate-matching lower bounds for this adaptive concentration statement. From a practical perspective, our result enables us to prove consistency results for adaptively grown forests in high dimensions, and to carry out valid post-selection inference in the sense of Berk et al. [2013] for subgroups defined by tree leaves.


Machine Learning Algorithms Mini-Course - Machine Learning Mastery

#artificialintelligence

Machine learning algorithms are a very large part of machine learning. You have to understand how they work to make any progress in the field. In this post you will discover a 14-part machine learning algorithms mini course that you can follow to finally understand machine learning algorithms. We are going to cover a lot of ground in this course and you are going to have a great time. Machine Learning Algorithms Mini-Course Photo by Jared Tarbell, some rights reserved. Before we get started, let's make sure you are in the right place. This mini-course will take you on a guided tour of machine learning algorithms from foundations and through 10 top techniques.


The Art of Data Science Part 1

@machinelearnbot

Data Scientist communities have their own complex jargon; multivariate regression models, Big data engineering, Hadoop, Map Reduce, Deep Learning etc. But, unfortunately businesses do not seem to care about how complex the term is or how impressive the math is! They want the results explained in non-tech terms. While working on Big Data & planning to implement it for the benefit of business, it is very important to explain the insights & valuable knowledge in a way that non-technical business user can actually understand. Here is my recent experience while working on a project for one of the largest food retailers. The goal of this project was how incentivisation would help improve their overall profits.


Jackknife and linear regression in Excel: implementation and comparison

@machinelearnbot

The comparison is performed on a data set where linear regression works well: salary offered to a candidate, based on programming language requirements in the job ad: Python, R or SQL. This is a follow-up to the article highest paying programming skills. The increased accuracy of linear regression estimates is negligible, and well below the noise level present in the data set. The Jackknife method has the advantage to be more stable, easy to code, easy to understand (no need to know matrix algebra), and easy to interpret (meaningful coefficients). Jackknife is not the first regression approximation developed by the author: check my book pages 172-176 for other examples.


Semi-supervised Vocabulary-informed Learning

arXiv.org Machine Learning

Despite significant progress in object categorization, in recent years, a number of important challenges remain, mainly, ability to learn from limited labeled data and ability to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of semi-supervised vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot and open set recognition using a unified framework. Specifically, we propose a maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms, ensuring that labeled samples are projected closest to their correct prototypes, in the embedding space, than to others. We show that resulting model shows improvements in supervised, zero-shot, and large open set recognition, with up to 310K class vocabulary on AwA and ImageNet datasets.


Large Scale Decision Forests: Lessons Learned

#artificialintelligence

We at Sift Science provide fraud detection for hundreds of customers spanning many industries and use cases. To do this, we have devised a specialized modeling stack that is able to adapt to individual customers while simultaneously delivering a great out-of-box experience for new customers, achieved by mixing the output from a "global" model โ€“ trained on our entire network of data โ€“ with the output from a customer's individualized model. Prior to decision forests, we used a custom-built logistic regression classifier combined with highly specialized feature engineering for our global model. While logistic regression has many great attributes, it is fundamentally limited by its inability to model non-linear interactions between features. At Sift, we tend to think of our modeling stack primarily as an enabler of our feature engineering; more powerful modeling allows us to extract the most insight from our features and can even lead to new classes of features.


Handling Imbalanced data when building regression models

@machinelearnbot

This is a good question, and one that seems to get raised time and time again. Myself and a colleague (Sven Crone from Lancaster University in the UK) published a paper on this issue last year in the International Journal of Forecasting. A summary of our findings can also be found in the book "Credit Scoring, Response Modeling and Insurance Rating. There are also some very good papers by G. Weiss from 2004/5 which are highly cited and referenced in our paper/book. What we found was that for some methods of model construction sample imbalance was not an issue at all โ€“ not even a tiny amount.


R Squared Theory - Practical Machine Learning Tutorial with Python p.10

#artificialintelligence

Welcome to the 10th part of our of our machine learning regression tutorial within our Machine Learning with Python tutorial series. We've just recently finished creating a working linear regression model, and now we're curious what is next. Right now, we can easily look at the data, and decide how "accurate" the regression line is to some degree. What happens, however, when your linear regression model is applied within 20 hierarchical layers in a neural network? Not only this, but your model works in steps, or windows, of say 100 data points at a time, within a dataset of 5 million datapoints.


Comparison of statistical software

@machinelearnbot

Dear Dr. Granville, the Stata / SPSS implementation of non-linear regression is quite inflexible. The SPSS implementation is especially bad, allowing one to type only simple, non-recursive functions in a pop-up window. If you got a project about implementing a non-linear regression for a complex functional form, you would use R, Matlab or a similar programming language. Following the general vibe of responses, I changed the "Non-linear Regression / SPSS" field to "Limited" to avoid potential misinterpretations of the table. However, the truth is: the SPSS implementation of non-linear regression is unsatisfactory for most industry-level research. Lasso is available in SPSS only as part of categorical regression, which does not cover linear regression and generalized linear models.