Goto

Collaborating Authors

 Regression


Predicting Car Prices Part 2: Using Neural Network

@machinelearnbot

This is part two of the series. In part one, we used linear regression model to predict the prices of used Toyota Corollas. There are some overlap in the materials for those just reading this post for the first time. For those who read the part 1 of the series using linear regression, then you can safely skip to the section where I applied neural networks to the same data set. In this post, we will use neural networks!


The best kept secret about linear and logistic regression

@machinelearnbot

All the regression theory developed by statisticians over the last 200 years (related to the general linear model) is useless. Regression can be performed as accurately without statistical models, including the computation of confidence intervals (for estimates, predicted values or regression parameters). The non-statistical approach is also more robust than theory described in all statistics textbooks and taught in all statistical courses. It does not require Map-Reduce when data is really big, nor any matrix inversion, maximum likelihood estimation, or mathematical optimization (Newton algorithm). It is indeed incredibly simple, robust, easy to interpret, and easy to code (no statistical libraries required).


Decision tree vs Logistic Regression

@machinelearnbot

There is no decision, except, Logistic Regression is parametric, while IDT is non-parametric. What you need to know is, they give you similar stuff you'll need, but using different approaches. AND, one is preferable over the other in certain situations. For eg, IDT can be very helpful when you want to know rules to create your segments! Also, when you have no clue what your data looks like, IDT is a good place to start.


Debugging Machine Learning Tasks

arXiv.org Machine Learning

Unlike traditional programs (such as operating systems or word processors) which have large amounts of code, machine learning tasks use programs with relatively small amounts of code (written in machine learning libraries), but voluminous amounts of data. Just like developers of traditional programs debug errors in their code, developers of machine learning tasks debug and fix errors in their data. However, algorithms and tools for debugging and fixing errors in data are less common, when compared to their counterparts for detecting and fixing errors in code. In this paper, we consider classification tasks where errors in training data lead to misclassifications in test points, and propose an automated method to find the root causes of such misclassifications. Our root cause analysis is based on Pearl's theory of causation, and uses Pearl's PS (Probability of Sufficiency) as a scoring metric. Our implementation, Psi, encodes the computation of PS as a probabilistic program, and uses recent work on probabilistic programs and transformations on probabilistic programs (along with gray-box models of machine learning algorithms) to efficiently compute PS. Psi is able to identify root causes of data errors in interesting data sets.


Predicting Glaucoma Visual Field Loss by Hierarchically Aggregating Clustering-based Predictors

arXiv.org Machine Learning

This study addresses the issue of predicting the glaucomatous visual field loss from patient disease datasets. Our goal is to accurately predict the progress of the disease in individual patients. As very few measurements are available for each patient, it is difficult to produce good predictors for individuals. A recently proposed clustering-based method enhances the power of prediction using patient data with similar spatiotemporal patterns. Each patient is categorized into a cluster of patients, and a predictive model is constructed using all of the data in the class. Predictions are highly dependent on the quality of clustering, but it is difficult to identify the best clustering method. Thus, we propose a method for aggregating cluster-based predictors to obtain better prediction accuracy than from a single cluster-based prediction. Further, the method shows very high performances by hierarchically aggregating experts generated from several cluster-based methods. We use real datasets to demonstrate that our method performs significantly better than conventional clustering-based and patient-wise regression methods, because the hierarchical aggregating strategy has a mechanism whereby good predictors in a small community can thrive.


Nuclear norm penalization and optimal rates for noisy low rank matrix completion

arXiv.org Machine Learning

This paper deals with the trace regression model where $n$ entries or linear combinations of entries of an unknown $m_1\times m_2$ matrix $A_0$ corrupted by noise are observed. We propose a new nuclear norm penalized estimator of $A_0$ and establish a general sharp oracle inequality for this estimator for arbitrary values of $n,m_1,m_2$ under the condition of isometry in expectation. Then this method is applied to the matrix completion problem. In this case, the estimator admits a simple explicit form and we prove that it satisfies oracle inequalities with faster rates of convergence than in the previous works. They are valid, in particular, in the high-dimensional setting $m_1m_2\gg n$. We show that the obtained rates are optimal up to logarithmic factors in a minimax sense and also derive, for any fixed matrix $A_0$, a non-minimax lower bound on the rate of convergence of our estimator, which coincides with the upper bound up to a constant factor. Finally, we show that our procedure provides an exact recovery of the rank of $A_0$ with probability close to 1. We also discuss the statistical learning setting where there is no underlying model determined by $A_0$ and the aim is to find the best trace regression model approximating the data.


Hidden decision trees revisited

@machinelearnbot

Note that in the logistic regression, we use constrained regression coefficients. These coefficients depend on 2 or 3 top parameters and have the same sign as the correlation between the rule they represent, and the response or score. This make the regression non-sensitive to high cross correlations among the "independent" variables (rules) which are indeed not independent in this case. This approach is similar to ridge regression, logic regression or Lasso regression. The regression is used to fine tune the top parameters associated with regression coefficients.


Understanding Linear Regression

@machinelearnbot

Linear regression is arguably one of the most widely used techniques in the data science world. But, a comprehensive understanding of this technique is not universal and it is at a level that is less than desired. First, a little history, the term regression was first used by Sir Francis Galton, a 19th century polymath. Galton was a pioneer in application of statistical methods in many branches of science, he studied the relative sizes of parents and their offsprings in various species of plants and animals. During this study he observed that a larger than average parent tends to produce a larger than average child, but the child is likely to be less large than the parent in terms of its relative position in its own generation.


Deep Learning Tutorial part 2/3: Artificial Neural Networks - Lazy Programmer

#artificialintelligence

This is part 2/3 of a series on deep learning and deep belief networks. This section will focus on artificial neural networks (ANNs) by building upon the logistic regression model we learned about last time. It'll be a little shorter because we already built the foundation for some very important topics in part 1 – namely the objective / error function and gradient descent. We will focus on 2 main functions of ANNs – the forward pass (prediction) and backpropagation (learning). Your sci-kit learn analogues would be model.predict()


R Users Will Now Inevitably Become Bayesians

#artificialintelligence

There are several reasons why everyone isn't using Bayesian methods for regression modeling. One reason is that Bayesian modeling requires more thought: you need pesky things like priors, and you can't assume that if a procedure runs without throwing an error that the answers are valid. A second reason is that MCMC sampling -- the bedrock of practical Bayesian modeling -- can be slow compared to closed-form or MLE procedures. A third reason is that existing Bayesian solutions have either been highly-specialized (and thus inflexible), or have required knowing how to use a generalized tool like BUGS, JAGS, or Stan. This third reason has recently been shattered in the R world by not one but two packages: brms and rstanarm.