Goto

Collaborating Authors

 Regression


Machine Learning

#artificialintelligence

Problems of this nature occur in fields as diverse as business, medicine, astrophysics, and public policy. Why estimate f? How do we estimate f? Suppose we observe and for We believe that there is a relationship between Y and at least one of the X's. We can model the relationship as Where f is an unknown function and ε is a random error with mean zero. Why Do We Estimate f? Statistical Learning, and this course, are all about how to estimate f. The term statistical learning refers to using the data to "learn" f. Why do we care about estimating f? There are 2 reasons for estimating f, Prediction and Inference.


Making data science accessible – Logistic Regression

@machinelearnbot

Regression is a modelling technique for predicting the values of an outcome variable from one or more explanatory variables. Logistic Regression is a specific approach for describing a binary outcome variable (for example yes/no). Let's assume you are own a new boutique shop. You have a list of potential clients you are thinking of inviting to a special event with the aim of maximizing the number of sales – who should you invite? Data on previous events you have run is a great starting point here, allowing you to predict an individual's likelihood of buying given the information you have on them.


4 Reasons Your Machine Learning Model is Wrong (and How to Fix It)

#artificialintelligence

There are a number of machine learning models to choose from. We can use Linear Regression to predict a value, Logistic Regression to classify distinct outcomes, and Neural Networks to model non-linear behaviors. When we build these models, we always use a set of historical data to help our machine learning algorithms learn what is the relationship between a set of input features to a predicted output. But even if this model can accurately predict a value from historical data, how do we know it will work as well on new data? Or more plainly, how do we evaluate whether a machine learning model is actually "good"?


Online Active Linear Regression via Thresholding

arXiv.org Machine Learning

We consider the problem of online active learning to collect data for regression modeling. Specifically, we consider a decision maker with a limited experimentation budget who must efficiently learn an underlying linear population model. Our main contribution is a novel threshold-based algorithm for selection of most informative observations; we characterize its performance and fundamental lower bounds. We extend the algorithm and its guarantees to sparse linear regression in high-dimensional settings. Simulations suggest the algorithm is remarkably robust: it provides significant benefits over passive random sampling in real-world datasets that exhibit high nonlinearity and high dimensionality --- significantly reducing both the mean and variance of the squared error.


Logistic Regression - Hosmer Lemeshow test

@machinelearnbot

Hi, when evaluating predictions, look at the initial breakdown in the data, because while you can get a good overall hit rate (i use 80% as a simple rule of thumb), looking at the data, what was your sensitivity and specificity. In other words, does your model classify both sets of conditions (outcome a and outcome b) you are modelling well? Having a high percentage in one group, and getting them classified correctly can really make your overall hit rate misleading. I would chek your residuals (the difference between your expected as a probability) and the observed, and see which cases you are misclassifying, and which ones you are misclassifying really badly,and perhaps then try and profile them. Also, remember that statistical significance can be boosted by sample size (power), and if you have a lot of cases, your predictors can be significanct.


Machine Learning, Linear and Bayesian Models for Logistic Regression in Failure Detection Problems

arXiv.org Machine Learning

In this work, we study the use of logistic regression in manufacturing failures detection. As a data set for the analysis, we used the data from Kaggle competition Bosch Production Line Performance. We considered the use of machine learning, linear and Bayesian models. For machine learning approach, we analyzed XGBoost tree based classifier to obtain high scored classification. Using the generalized linear model for logistic regression makes it possible to analyze the influence of the factors under study. The Bayesian approach for logistic regression gives the statistical distribution for the parameters of the model. It can be useful in the probabilistic analysis, e.g. risk assessment.


Source code for Robust Ridge and Linear Regression with Bootstrap

@machinelearnbot

Allows you to set up bounds on the regression parameters (similar to ridge regression). Does not use matrix inversion, thus numerically stable. Robust parameter estimation based on Monte-Carlo simulations and re-sampling. The source code can easily be modified to perform logistic regression. This package can be used by scientists, programmers, analysts or engineers with limited statistical knowledge.


Going Deeper into Regression Analysis with Assumptions, Plots & Solutions

@machinelearnbot

Regression analysis marks the first step in predictive modeling. No doubt, it's fairly easy to implement. Neither it's syntax nor its parameters create any kind of confusion. But, merely running just one line of code, doesn't solve the purpose. Regression tells much more than that!


Network-Guided Biomarker Discovery

arXiv.org Machine Learning

Identifying measurable genetic indicators (or biomarkers) of a specific condition of a biological system is a key element of precision medicine. Indeed it allows to tailor diagnostic, prognostic and treatment choice to individual characteristics of a patient. In machine learning terms, biomarker discovery can be framed as a feature selection problem on whole-genome data sets. However, classical feature selection methods are usually underpowered to process these data sets, which contain orders of magnitude more features than samples. This can be addressed by making the assumption that genetic features that are linked on a biological network are more likely to work jointly towards explaining the phenotype of interest. We review here three families of methods for feature selection that integrate prior knowledge in the form of networks.