Goto

Collaborating Authors

 Regression


Computationally Efficient Robust Estimation of Sparse Functionals

arXiv.org Machine Learning

Complex high-dimensional datasets pose a variety of computational and statistical challenges. In attempts to address these challenges, the past decade has witnessed a significant amount of research on sparsity constraints in statistical models. Sparsity constraints have practical and theoretical benefits: often they lead to more interpretable models, that can be estimated efficiently even in the high-dimensional regime where the sample size n can be dwarfed by the model dimension d. In addition to being convenient from a methodological and theoretical standpoint, sparse models have also had enormous practical impact, for instance in computational biology, neuroscience and applied machine learning. On the other hand, much of the theoretical literature on sparse estimation has focused on providing guarantees under strong, often impractical, generative assumptions.


GapTV: Accurate and Interpretable Low-Dimensional Regression and Classification

arXiv.org Machine Learning

We consider the problem of estimating a regression function in the common situation where the number of features is small, where interpretability of the model is a high priority, and where simple linear or additive models fail to provide adequate performance. To address this problem, we present GapTV, an approach that is conceptually related both to CART and to the more recent CRISP algorithm, a state-of-the-art alternative method for interpretable nonlinear regression. GapTV divides the feature space into blocks of constant value and fits the value of all blocks jointly via a convex optimization routine. Our method is fully data-adaptive, in that it incorporates highly robust routines for tuning all hyperparameters automatically. We compare our approach against CART and CRISP and demonstrate that GapTV finds a much better trade-off between accuracy and interpretability.


Causal Regularization

arXiv.org Machine Learning

In application domains such as healthcare, we want accurate predictive models that are also causally interpretable. In pursuit of such models, we propose a causal regularizer to steer predictive models towards causally-interpretable solutions and theoretically study its properties. In a large-scale analysis of Electronic Health Records (EHR), our causally-regularized model outperforms its L1-regularized counterpart in causal accuracy and is competitive in predictive performance. We perform non-linear causality analysis by causally regularizing a special neural network architecture. We also show that the proposed causal regularizer can be used together with neural representation learning algorithms to yield up to 20% improvement over multilayer perceptron in detecting multivariate causation, a situation common in healthcare, where many causal factors should occur simultaneously to have an effect on the target variable.


A Unified Parallel Algorithm for Regularized Group PLS Scalable to Big Data

arXiv.org Machine Learning

Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocs of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modelling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparsity version of PLS methods is the link between the SVD of a matrix (constructed from deflated versions of the original matrices of data) and least squares minimisation in linear regression. We present here an accurate description of the most popular PLS methods, alongside their mathematical proofs. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. Various approaches to decrease the computation time are offered, and we show how the whole procedure can be scalable to big data sets.


Learning Optimal Interventions

arXiv.org Machine Learning

Our goal is to identify beneficial interventions from observational data. We consider interventions that are narrowly focused (impacting few covariates) and may be tailored to each individual or globally enacted over a population. For applications where harmful intervention is drastically worse than proposing no change, we propose a conservative definition of the optimal intervention. Assuming the underlying relationship remains invariant under intervention, we develop efficient algorithms to identify the optimal intervention policy from limited data and provide theoretical guarantees for our approach in a Gaussian Process setting. Although our methods assume covariates can be precisely adjusted, they remain capable of improving outcomes in misspecified settings where interventions incur unintentional downstream effects. Empirically, our approach identifies good interventions in two practical applications: gene perturbation and writing improvement.


A Sparse Linear Model and Significance Test for Individual Consumption Prediction

arXiv.org Machine Learning

Accurate prediction of user consumption is a key part not only in understanding consumer flexibility and behavior patterns, but in the design of robust and efficient energy saving programs as well. Existing prediction methods usually have high relative errors that can be larger than 30% and have difficulties accounting for heterogeneity between individual users. In this paper, we propose a method to improve prediction accuracy of individual users by adaptively exploring sparsity in historical data and leveraging predictive relationship between different users. Sparsity is captured by popular least absolute shrinkage and selection estimator, while user selection is formulated as an optimal hypothesis testing problem and solved via a covariance test. Using real world data from PG&E, we provide extensive simulation validation of the proposed method against well-known techniques such as support vector machine, principle component analysis combined with linear regression, and random forest. The results demonstrate that our proposed methods are operationally efficient because of linear nature, and achieve optimal prediction performance. Pan Li and Baosen Zhang are with the Department of Electrical Engineering, University of Washington, Seattle, WA, 98195, (email: {pli69, zhangbao}@uw.edu). Yang Weng and Ram Rajagopal are with the Civil and Environmental Department, Stanford University, Stanford, CA, 94035, (email: {yangweng, ramr}@stanford.edu). 2 Estimated consumption at time t. Estimated variance of the noise. Electric load forecasting is an important problem in the power engineering industry and have received extensive attention from both industry and academia over the last century. Many different forecasting techniques have been developed during this time. The authors in [1] present a comprehensive literature review on different methods related to load forecasting, from regression models to expert systems. Time series methods are further discussed in [2]. A thorough research on load and price forecasting is presented in [3]. A common theme among many of the established methods is that they are used to forecast relative large loads, from substations serving megawatts to transmission networks serving more than gigawatts of power [4]. Recent advances in technology such as smart meters, bidirectional communication capabilities and distributed energy resources have made individual households active participants in the power system. Many applications and programs based on these new technologies require estimating the future load of individual homes.


Interpreting Outliers: Localized Logistic Regression for Density Ratio Estimation

arXiv.org Machine Learning

We propose an inlier-based outlier detection method capable of both identifying the outliers and explaining why they are outliers, by identifying the outlier-specific features. Specifically, we employ an inlier-based outlier detection criterion, which uses the ratio of inlier and test probability densities as a measure of plausibility of being an outlier. For estimating the density ratio function, we propose a localized logistic regression algorithm. Thanks to the locality of the model, variable selection can be outlier-specific, and will help interpret why points are outliers in a high-dimensional space. Through synthetic experiments, we show that the proposed algorithm can successfully detect the important features for outliers. Moreover, we show that the proposed algorithm tends to outperform existing algorithms in benchmark datasets.


Best Linear Predictor with Missing Response: Locally Robust Approach

arXiv.org Machine Learning

This paper provides asymptotic theory for Inverse Probability Weighing (IPW) and Locally Robust Estimator (LRE) of Best Linear Predictor where the response missing at random (MAR), but not completely at random (MCAR). We relax previous assumptions in the literature about the first-step nonparametric components, requiring only their mean square convergence. This relaxation allows to use a wider class of machine leaning methods for the first-step, such as lasso. For a generic first-step, IPW incurs a first-order bias unless the model it approximates is truly linear in the predictors. In contrast, LRE remains first-order unbiased provided one can estimate the conditional expectation of the response with sufficient accuracy. An additional novelty is allowing the dimension of Best Linear Predictor to grow with sample size. These relaxations are important for estimation of best linear predictor of teacher-specific and hospital-specific effects with large number of individuals.


A Machine Learning Alternative to P-values

arXiv.org Machine Learning

This paper presents an alternative approach to p-values in regression settings. This approach, whose origins can be traced to machine learning, is based on the leave-one-out bootstrap for prediction error. In machine learning this is called the out-of-bag (OOB) error. To obtain the OOB error for a model, one draws a bootstrap sample and fits the model to the in-sample data. The out-of-sample prediction error for the model is obtained by calculating the prediction error for the model using the out-of-sample data. Repeating and averaging yields the OOB error, which represents a robust cross-validated estimate of the accuracy of the underlying model. By a simple modification to the bootstrap data involving "noising up" a variable, the OOB method yields a variable importance (VIMP) index, which directly measures how much a specific variable contributes to the prediction precision of a model. VIMP provides a scientifically interpretable measure of the effect size of a variable, we call the "predictive effect size", that holds whether the researcher's model is correct or not, unlike the p-value whose calculation is based on the assumed correctness of the model. We also discuss a marginal VIMP index, also easily calculated, which measures the marginal effect of a variable, or what we call "the discovery effect". The OOB procedure can be applied to both parametric and nonparametric regression models and requires only that the researcher can repeatedly fit their model to bootstrap and modified bootstrap data. We illustrate this approach on a survival data set involving patients with systolic heart failure and to a simulated survival data set where the model is incorrectly specified to illustrate its robustness to model misspecification.


Machine Learning: Why it Matters? - insideBIGDATA

#artificialintelligence

Are you into Machine Learning OR are you "just" a Statistician? Have you been asked this question yet? If you are in a career or looking to get into one that has anything to do with deriving insights out of data, you probably know what I am talking about. The year 2016 has seen over three dozen machine learning startups being acquired by tech giants; another several dozen machine learning startups raked up a aggregate funding to the tune of $4 Billion worldwide. Is it a blip or a bubble?