Regression
Heteroscedastic Gaussian Process Regression on the Alkenone over Sea Surface Temperatures
Lee, Taehee, Lawrence, Charles E.
To restore the historical sea surface temperatures (SSTs) better, it is important to construct a good calibration model for the associated proxies. In this paper, we introduce a new model for alkenone (${\rm{U}}_{37}^{\rm{K}'}$) based on the heteroscedastic Gaussian process (GP) regression method. Our nonparametric approach not only deals with the variable pattern of noises over SSTs but also contains a Bayesian method of classifying potential outliers.
Benchmarking the Neural Linear Model for Regression
Ober, Sebastian W., Rasmussen, Carl Edward
The neural linear model is a simple adaptive Bayesian linear regression method that has recently been used in a number of problems ranging from Bayesian optimization to reinforcement learning. Despite its apparent successes in these settings, to the best of our knowledge there has been no systematic exploration of its capabilities on simple regression tasks. In this work we characterize these on the UCI datasets, a popular benchmark for Bayesian regression models, as well as on the recently introduced UCI "gap" datasets, which are better tests of out-of-distribution uncertainty. We demonstrate that the neural linear model is a simple method that shows generally good performance on these tasks, but at the cost of requiring good hyperparameter tuning.
Extrinsic Kernel Ridge Regression Classifier for Planar Kendall Shape Space
Lee, Hwiyoung, Patrangenaru, Vic
Kernel methods have had great success in the statistics and machine learning community. Despite their growing popularity, however, less effort has been drawn towards developing kernel based classification methods on manifold due to the non-Euclidean geometry. In this paper, motivated by the extrinsic framework of manifold-valued data analysis, we propose two types of new kernels on planar Kendall shape space $\Sigma_2^k$, called extrinsic Veronese Whitney Gaussian kernel and extrinsic complex Gaussian kernel. We show that our approach can be extended to develop Gaussian like kernels on any embedded manifold. Furthermore, kernel ridge regression classifier (KRRC) is implemented to address the shape classification problem on $\Sigma_2^k$, and their promising performances are illustrated through the real dataset.
Back to the Basics: How Does Machine Learning Actually Work? AgileThought
Machine learning is a hot topic with many businesses investing in the technology--but often, the businesses investing in this space don't have a solid understanding of the basics of machine learning, which can lead to poor results. Let's remedy this problem by explaining machine learning in simple terms: In traditional software engineering, a problem is decomposed into smaller problems--then each problem is solved using brute force techniques with hard-coded rules. And by hard codes rules, I mean that for each case of inputs, a new block of logic (code) must be written to handle it. As a working example throughout this post, let's say we're trying to determine whether a given email is spam or not--like Gmail. In this case, imagine that we're using the number of all-caps words in the email as input.
7 Regression Types and Techniques in Data Science
Linear and Logistic regressions are usually the first algorithms people learn in data science. Due to their popularity, a lot of analysts even end up thinking that they are the only form of regressions. The ones who are slightly more involved think that they are the most important among all forms of regression analysis. The truth is that there are innumerable forms of regressions, which can be performed. Each form has its own importance and a specific condition where they are best suited to apply.
Learning Mixtures of Linear Regressions in Subexponential Time via Fourier Moments
Chen, Sitan, Li, Jerry, Song, Zhao
We consider the problem of learning a mixture of linear regressions (MLRs). An MLR is specified by $k$ nonnegative mixing weights $p_1, \ldots, p_k$ summing to $1$, and $k$ unknown regressors $w_1,...,w_k\in\mathbb{R}^d$. A sample from the MLR is drawn by sampling $i$ with probability $p_i$, then outputting $(x, y)$ where $y = \langle x, w_i \rangle + \eta$, where $\eta\sim\mathcal{N}(0,\varsigma^2)$ for noise rate $\varsigma$. Mixtures of linear regressions are a popular generative model and have been studied extensively in machine learning and theoretical computer science. However, all previous algorithms for learning the parameters of an MLR require running time and sample complexity scaling exponentially with $k$. In this paper, we give the first algorithm for learning an MLR that runs in time which is sub-exponential in $k$. Specifically, we give an algorithm which runs in time $\widetilde{O}(d)\cdot\exp(\widetilde{O}(\sqrt{k}))$ and outputs the parameters of the MLR to high accuracy, even in the presence of nontrivial regression noise. We demonstrate a new method that we call "Fourier moment descent" which uses univariate density estimation and low-degree moments of the Fourier transform of suitable univariate projections of the MLR to iteratively refine our estimate of the parameters. To the best of our knowledge, these techniques have never been used in the context of high dimensional distribution learning, and may be of independent interest. We also show that our techniques can be used to give a sub-exponential time algorithm for learning mixtures of hyperplanes, a natural hard instance of the subspace clustering problem.
More Data Can Hurt for Linear Regression: Sample-wise Double Descent
In this expository note we describe a surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples: there is a regime where the test risk of the estimator found by gradient descent increases with additional samples. In other words, more data actually hurts the estimator. This behavior is implicit in a recent line of theoretical works analyzing "double-descent" phenomenon in linear models. In this note, we isolate and understand this behavior in an extremely simple setting: linear regression with isotropic Gaussian covariates. In particular, this occurs due to an unconventional type of bias-variance tradeoff in the overparameterized regime: the bias decreases with more samples, but variance increases.
Efficient adjustment sets for population average treatment effect estimation in non-parametric causal graphical models
Rotnitzky, Andrea, Smucler, Ezequiel
The method of covariate adjustment is often used for estimation of population average treatment effects in observational studies. Graphical rules for determining all valid covariate adjustment sets from an assumed causal graphical model are well known. Restricting attention to causal linear models, a recent article derived two novel graphical criteria: one to compare the asymptotic variance of linear regression treatment effect estimators that control for certain distinct adjustment sets and another to identify the optimal adjustment set that yields the least squares treatment effect estimator with the smallest asymptotic variance among consistent adjusted least squares estimators. In this paper we show that the same graphical criteria can be used in non-parametric causal graphical models when treatment effects are estimated by contrasts involving non-parametrically adjusted estimators of the interventional means. We also provide a graphical criterion for determining the optimal adjustment set among the minimal adjustment sets, which is valid for both linear and non-parametric estimators. We provide a new graphical criterion for comparing time dependent adjustment sets, that is, sets comprised by covariates that adjust for future treatments and that are themselves affected by earlier treatments. We show by example that uniformly optimal time dependent adjustment sets do not always exist. In addition, for point interventions, we provide a sound and complete graphical criterion for determining when a non-parametric optimally adjusted estimator of an interventional mean, or of a contrast of interventional means, is as efficient as an efficient estimator of the same parameter that exploits the information in the conditional independencies encoded in the non-parametric causal graphical model.
Estimation and Validation of a Class of Conditional Average Treatment Effects Using Observational Data
Yadlowsky, Steve, Pellegrini, Fabio, Lionetto, Federica, Braune, Stefan, Tian, Lu
While sample sizes in randomized clinical trials are large enough to estimate the average treatment effect well, they are often insufficient for estimation of treatment-covariate interactions critical to studying data-driven precision medicine. Observational data from real world practice may play an important role in alleviating this problem. One common approach in trials is to predict the outcome of interest with separate regression models in each treatment arm, and recommend interventions based on the contrast of the predicted outcomes. Unfortunately, this simple approach may induce spurious treatment-covariate interaction in observational studies when the regression model is misspecified. Motivated by the need of modeling the number of relapses in multiple sclerosis patients, where the ratio of relapse rates is a natural choice of the treatment effect, we propose to estimate the conditional average treatment effect (CATE) as the relative ratio of the potential outcomes, and derive a doubly robust estimator of this CATE in a semiparametric model of treatment-covariate interactions. We also provide a validation procedure to check the quality of the estimator on an independent sample. We conduct simulations to demonstrate the finite sample performance of the proposed methods, and illustrate the advantage of this approach on real data examining the treatment effect of dimethyl fumarate compared to teriflunomide in multiple sclerosis patients.
Bayesian Linear Regression on Deep Representations
Moberg, John, Svensson, Lennart, Pinto, Juliano, Wymeersch, Henk
A simple approach to obtaining uncertainty-aware neural networks for regression is to do Bayesian linear regression (BLR) on the representation from the last hidden layer. Recent work [Riquelme et al., 2018, Azizzadenesheli et al., 2018] indicates that the method is promising, though it has been limited to homoscedastic noise. In this paper, we propose a novel variation that enables the method to flexibly model heteroscedastic noise. The method is benchmarked against two prominent alternative methods on a set of standard datasets, and finally evaluated as an uncertainty-aware model in model-based reinforcement learning. Our experiments indicate that the method is competitive with standard ensembling, and ensembles of BLR outperforms the methods we compared to.