Regression
Sparse Gaussian Process Regression Beyond Variational Inference
Jankowiak, Martin, Pleiss, Geoff, Gardner, Jacob R.
The combination of inducing point methods with stochastic variational inference has enabled approximate Gaussian Process (GP) inference on large datasets. Unfortunately, the resulting predictive distributions often exhibit substantially underestimated uncertainties. Worse still, in the regression case the predictive variance is typically dominated by observation noise, yielding uncertainty estimates that make little use of the input-dependent function uncertainty that makes GP priors attractive. In this work we propose a simple inference procedure that bypasses posterior approximations and instead directly targets the posterior predictive distribution. In an extensive empirical comparison with a number of alternative inference strategies on univariate and multivariate regression tasks, we find that the resulting predictive distributions exhibit significantly better calibrated uncertainties and higher log likelihoods--often by as much as half a nat or more per datapoint.
Learning Sample-Specific Models with Low-Rank Personalized Regression
Lengerich, Benjamin, Aragam, Bryon, Xing, Eric P.
Modern applications of machine learning (ML) deal with increasingly heterogeneous datasets comprised of data collected from overlapping latent subpopulations. As a result, traditional models trained over large datasets may fail to recognize highly predictive localized effects in favour of weakly predictive global patterns. This is a problem because localized effects are critical to developing individualized policies and treatment plans in applications ranging from precision medicine to advertising. To address this challenge, we propose to estimate sample-specific models that tailor inference and prediction at the individual level. In contrast to classical ML models that estimate a single, complex model (or only a few complex models), our approach produces a model personalized to each sample. These sample-specific models can be studied to understand subgroup dynamics that go beyond coarse-grained class labels. Crucially, our approach does not assume that relationships between samples (e.g. a similarity network) are known a priori. Instead, we use unmodeled covariates to learn a latent distance metric over the samples. We apply this approach to financial, biomedical, and electoral data as well as simulated data and show that sample-specific models provide fine-grained interpretations of complicated phenomena without sacrificing predictive accuracy compared to state-of-the-art models such as deep neural networks.
A note on the consistency of the random forest algorithm
Nowadays, the algorithm is acknowledged to be easy to use and to perform very well in general, even in problems involving many predictor variables (see for instance Biau and Scornet (2016) or the introduction to Scornet, Biau and Vert (2015)) โ so well, indeed, that several authors have posed and studied the question of their consistency (see Scornet, Biau and Vert (2015) and the earlier references provided by them). Consistent nonparametric statistical predictors have been known for a long time (e.g. Nadaraya (1964), Watson (1964), Stone (1977), Devroye and Wagner (1980)), but they converge very slowly and their computer implementations tend to be slow, especially when they involve many variables. In view of their comparative accuracy and high speed of implementation, random forests would become even more attractive if they were shown to be consistent under general data โ generating mechanisms. Besides, consistency is almost indispensable in applications of statistical prediction to the estimation of'causal effects' based on observational data (e.g.
Four Books to start with Machine Learning -- Machine Learning for Beginners. -- Lysten
This book explains the concept of machine learning starting from the very basics of Linear Regression and Logistic Regression, and ends at Multilevel Perceptrons to do Image Recognition. The best part about this book is that it assumes no prior knowledge in machine learning or even computer programming. The only basic requirement I see is the ability read basic English and the basic knowledge of high school level math. The author has also provided preprocessed data sets and a github repository, hence it is easy to start getting your hands dirty as soon as possible. This book is quite basic, but does the most crucial job of getting even the most layman to get excited about the field of Machine Learning and Deep Learning.
Machine Learning Applications
Last year at the Ignition Community conference, Inductive Automation's Kevin McClusky (co-director of sales engineering) and Kathy Applebaum (senior software engineer) explored the various ways in which machine learning can be applied in industry. In this presentation, they delved deep into the types of machine learning most applicable to industry and the algorithms behind them. You can read more about this 2018 presentation in the article "How to Apply Industrial Machine Learning," which was based on that presentation. At this year's event, McClusky and Applebaum came together again to highlight the integration of more machine learning capabilities into Ignition over the past year, as well as to showcase four industrial use cases of machine learning being explored by Ignition users. Newly available machine learning capabilities in Ignition enable users to take advantage of the Apache Math 3 library previously added to Ignition 7.9.10 just prior to the release of Ignition 8.
#005B Logistic Regression: Scratch vs. Scikit-Learn Master Data Science
Let's now compare Logistic Regression from scratch and Logistic Regression from scikit โ learn. Our dataset are class 0 and class 1, which we generated randomly. The training set has 2000 examples coming from the first and second class. The test set has 1000 examples, 500 from each class. Python's library scikit-learn has function LogisticRegression and we will implement it on our dataset .
Distribution-free conditional predictive bands using density estimators
Izbicki, Rafael, Shimizu, Gilson T., Stern, Rafael B.
Conformal methods create prediction bands that control average coverage under no assumptions besides i.i.d. data. Besides average coverage, one might also desire to control conditional coverage, that is, coverage for every new testing point. However, without strong assumptions, conditional coverage is unachievable. Given this limitation, the literature has focused on methods with asymptotical conditional coverage. In order to obtain this property, these methods require strong conditions on the dependence between the target variable and the features. We introduce two conformal methods based on conditional density estimators that do not depend on this type of assumption to obtain asymptotic conditional coverage: Dist-split and CD-split. While Dist-split asymptotically obtains optimal intervals, which are easier to interpret than general regions, CD-split obtains optimal size regions, which are smaller than intervals. CD-split also obtains local coverage by creating a data-driven partition of the feature space that scales to high-dimensional settings and by generating prediction bands locally on the partition elements. In a wide variety of simulated scenarios, our methods have a better control of conditional coverage and have smaller length than previously proposed methods.
Machine Learning Algorithms In Layman's Terms, Part 1
As a recent graduate of the Flatiron School's Data Science Bootcamp, I've been inundated with advice on how to ace technical interviews. A soft skill that keeps coming to the forefront is the ability to explain complex machine learning algorithms to a non-technical person. This series of posts is me sharing with the world how I would explain all the machine learning topics I come across on a regular basis...to my grandma. Some get a bit in-depth, others less so, but all I believe are useful to a non-Data Scientist. In the upcoming parts of this series, I'll be going over: "a model is like a Vending Machine, which given an input (money), will give you some output (a soda can maybe) . . . An algorithm is what is used to train a model, all the decisions a model is supposed to take based on the given input, to give an expected output. For example, an algorithm will decide based on the dollar value of the money given, and the product you chose, whether the money is enough or not, how much balance you are supposed to get [back], and so on."
First order expansion of convex regularized estimators
Bellec, Pierre C, Kuchibhotla, Arun K
We consider first order expansions of convex penalized estimators in high-dimensional regression problems with random designs. Our setting includes linear regression and logistic regression as special cases. For a given penalty function $h$ and the corresponding penalized estimator $\hat\beta$, we construct a quantity $\eta$, the first order expansion of $\hat\beta$, such that the distance between $\hat\beta$ and $\eta$ is an order of magnitude smaller than the estimation error $\|\hat{\beta} - \beta^*\|$. In this sense, the first order expansion $\eta$ can be thought of as a generalization of influence functions from the mathematical statistics literature to regularized estimators in high-dimensions. Such first order expansion implies that the risk of $\hat{\beta}$ is asymptotically the same as the risk of $\eta$ which leads to a precise characterization of the MSE of $\hat\beta$; this characterization takes a particularly simple form for isotropic design. Such first order expansion also leads to inference results based on $\hat{\beta}$. We provide sufficient conditions for the existence of such first order expansion for three regularizers: the Lasso in its constrained form, the lasso in its penalized form, and the Group-Lasso. The results apply to general loss functions under some conditions and those conditions are satisfied for the squared loss in linear regression and for the logistic loss in the logistic model.
Robust Hierarchical-Optimization RLS Against Sparse Outliers
Slavakis, Konstantinos, Banerjee, Sinjini
This paper fortifies the recently introduced hierarchical-optimization recursive least squares (HO-RLS) against outliers which contaminate infrequently linear-regression models. Outliers are modeled as nuisance variables and are estimated together with the linear filter/system variables via a sparsity-inducing (non-)convexly regularized least-squares task. The proposed outlier-robust HO-RLS builds on steepest-descent directions with a constant step size (learning rate), needs no matrix inversion (lemma), accommodates colored nominal noise of known correlation matrix, exhibits small computational footprint, and offers theoretical guarantees, in a probabilistic sense, for the convergence of the system estimates to the solutions of a hierarchical-optimization problem: Minimize a convex loss, which models a-priori knowledge about the unknown system, over the minimizers of the classical ensemble LS loss. Extensive numerical tests on synthetically generated data in both stationary and non-stationary scenarios showcase notable improvements of the proposed scheme over state-of-the-art techniques.