Regression
On Estimation of Conditional Modes Using Multiple Quantile Regressions
The estimation of the conditional mode, or modal regression [24, 11, 5, 22], is an important topic in statistics [21, 25, 24], econometrics [16, 17, 8, 15, 11], and machine learning [7, 22]. Compared to ordinary regression, modal regression is particularly useful when the data distribution is highly skewed and has fat tails. 1 In such a situation, ordinary regression, which estimates the conditional mean of the distribution, fails to capture the major trend underlying the data. This is because the conditional mean is not necessarily the point where the data points distribute densely, i.e., it can be far away from the majority of the data. Conditional mode is a convenient alternative to the conditional mean in this situation as it can capture the majority of the data. Hence, with modal regression, we can find a major trend underlying the data.
How to create a sliced fit plot in SAS
I previously showed an easy way to visualize a regression model that has several continuous explanatory variables: use the SLICEFIT option in the EFFECTPLOT statement in SAS to create a sliced fit plot. The EFFECTPLOT statement is directly supported by the syntax of the GENMOD, LOGISTIC, and ORTHOREG procedures in SAS/STAT. Most parametric regression procedures in SAS (GLM, GLIMMIX, MIXED, ...) support the STORE statement, which enables you to save a representation of the model in a SAS item store. The following program creates sample data for 500 patients in a medical study. The call to PROC GLM fits a linear regression model that predicts the level of cholesterol from five explanatory variables.
Robust Detection of Covariate-Treatment Interactions in Clinical Trials
Goujaud, Baptiste, Tramel, Eric W., Courtiol, Pierre, Zaslavskiy, Mikhail, Wainrib, Gilles
Designing new and efficient therapies is a long and ever more costly process, with less than ten percent of new treatments entering Phase I finally being approved by the FDA and commercialized [1, 2]. One of the major challenges for the improvement of drug development is to better understand how drugs interact with patients, particularly for treatments displaying heterogeneous responses. Therefore, conducting a detailed analysis of clinical trial data is critical to find subgroups of patients with higher benefit-risk ratio or to understand why a drug does not work on some subpopulation to improve existing therapeutic strategies. Moreover, understanding the relationships of patient descriptors which compose the most responsive cross-section of the population is of great importance when planning a Phase III trial, for salvaging failed trials, or accelerating advances in personalized medicine. This process of biomarker identification is critical to detect subgroups within a given indication, but, as shown recently for immunotherapies, can also provide the basis for pan-indication drug approval [3].
A Convex Program for Mixed Linear Regression with a Recovery Guarantee for Well-Separated Data
We introduce a convex approach for mixed linear regression over $d$ features. This approach is a second-order cone program, based on L1 minimization, which assigns an estimate regression coefficient in $\mathbb{R}^{d}$ for each data point. These estimates can then be clustered using, for example, $k$-means. For problems with two or more mixture classes, we prove that the convex program exactly recovers all of the mixture components in the noiseless setting under technical conditions that include a well-separation assumption on the data. Under these assumptions, recovery is possible if each class has at least $d$ independent measurements. We also explore an iteratively reweighted least squares implementation of this method on real and synthetic data.
Orthogonal Machine Learning: Power and Limitations
Mackey, Lester, Syrgkanis, Vasilis, Zadik, Ilias
Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate. The key is to employ Neyman-orthogonal moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the $n^{-1/4}$ requirement can be improved to $n^{-1/(2k+2)}$ by employing a $k$-th order notion of orthogonality that grants robustness to more complex or higher-dimensional nuisance parameters. In the partially linear regression setting popular in causal inference, we show that we can construct second-order orthogonal moments if and only if the treatment residual is not normally distributed. Our proof relies on Stein's lemma and may be of independent interest. We conclude by demonstrating the robustness benefits of an explicit doubly-orthogonal estimation procedure for treatment effect.
On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models
Wang, Yining, Yu, Adams Wei, Singh, Aarti
We derive computationally tractable methods to select a small subset of experiment settings from a large pool of given design points. The primary focus is on linear regression models, while the technique extends to generalized linear models and Delta's method (estimating functions of linear regression models) as well. The algorithms are based on a continuous relaxation of an otherwise intractable combinatorial optimization problem, with sampling or greedy procedures as post-processing steps. Formal approximation guarantees are established for both algorithms, and numerical results on both synthetic and real-world data confirm the effectiveness of the proposed methods.
Machine Learning with Oracle JET and TensorFlow โ Oracle Developers โ Medium
Oracle JET works with any kind of REST service, such service could be the one coming from TensorFlow (read more in my previous post -- TensorFlow Linear Regression Model Access with Custom REST API using Flask). There is option to define training steps (or data points) and learning rate. As outcome we get W and b values for linear equation y Wx b. After training is executed (so called machine learning process) -- W and b parameters are identified, this allows to predict y value for any x. More about this in my next post, today will focus on JET.
Machine Learning - Dzone Refcardz
To avoid an over-fitting problem (the trained model fits too well with the training data and is not generalized enough), the regularization technique is used to shrink the magnitude of ฦi. This is done by adding a penalty (a function of the sum of ฦi) into the cost function. In L2 regularization (also known as Ridge regression), ฦi2 will be added to the cost function. In L1 regularization (also known as Lasso regression), ฦi will be added to the cost function. Both L1, L2 will shrink the magnitude of ฦi.
Accurate Inference for Adaptive Linear Models
Deshpande, Yash, Mackey, Lester, Syrgkanis, Vasilis, Taddy, Matt
Estimators computed from adaptively collected data do not behave like their non-adaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method decorrelation procedure -- W-decorrelation -- for transforming the bias of adaptive linear regression estimators into variance. The method uses only coarse-grained information about the data collection policy and does not need access to propensity scores or exact knowledge of the policy. We bound the finite-sample bias and variance of the W-estimator and develop asymptotically correct confidence intervals based on a novel martingale central limit theorem. We then demonstrate the empirical benefits of the generic W-decorrelation procedure in two different adaptive data settings: the multi-armed bandits and autoregressive time series models.
Group-By Modeling in R Made Easy
There are several aspects of the R language that make it hard to learn, and repeating a model for groups in a data set used to be one of them. Here I briefly describe R's built-in approach, show a much easier one, then refer you to a new approach described in the superb book, R for Data Science, by Hadley Wickham and Garrett Grolemund. The gapminder data set contains a few measurements for countries around the world every five years from 1952 through 2007. Let's create a simple regression model to predict life expectancy from year. We'll start by looking at just New Zealand.