Goto

Collaborating Authors

 Regression


Data Science Dictionary

@machinelearnbot

The idea of cross-validation is to split the data into N subsets, to put one subset aside, to estimate parameters of the model from the remaining N-1 subsets, and to use the retained subset to estimate the error of the model. Such a process is repeated N times - with each of the N subsets being used as the validation set . Then the values of the errors obtained in such N steps are combined to provide the final estimate of the model error. The cross-validation is used in various classification and prediction procedures, such as regression analysis, discriminant analysis, neural networks and classification and regression trees (CART) . The goal is to improve the quality of the decision that is made from the outcome of the study on the basis of statistical methods, and to ensure that maximum information is obtained from scarce experimental data.


Reinforcement-based Simultaneous Algorithm and its Hyperparameters Selection

arXiv.org Machine Learning

Many algorithms for data analysis exist, especially for classification problems. To solve a data analysis problem, a proper algorithm should be chosen, and also its hyperparameters should be selected. In this paper, we present a new method for the simultaneous selection of an algorithm and its hyperparameters. In order to do so, we reduced this problem to the multi-armed bandit problem. We consider an algorithm as an arm and algorithm hyperparameters search during a fixed time as the corresponding arm play. We also suggest a problem-specific reward function. We performed the experiments on 10 real datasets and compare the suggested method with the existing one implemented in Auto-WEKA. The results show that our method is significantly better in most of the cases and never worse than the Auto-WEKA.


Nonlinear variable selection with continuous outcome: a nonparametric incremental forward stagewise approach

arXiv.org Machine Learning

We present a method of variable selection for the situation where some predictors are nonlinearly associated with a continuous outcome variable. The method doesn't assume any specific functional form, and can select from a large number of candidates. It takes the form of incremental forward stagewise regression, in which very small steps are taken to select the variables. Given no functional form is assumed, we devised an approach termed roughening to adjust the residuals in the iterations. In simulations, we show the new method is competitive against popular machine learning approaches. We also demonstrate its performance using some real datasets.


Urban Distribution Grid Topology Estimation via Group Lasso

arXiv.org Machine Learning

The growing penetration of distributed energy resources (DERs) in urban areas raises multiple reliability issues. The topology reconstruction is a critical step to ensure the robustness of distribution grid operation. However, the bus connectivity and network topology reconstruction are hard in distribution grids. The reasons are that 1) the branches are challenging and expensive to monitor due to underground setup; 2) the inappropriate assumption of radial topology in many studies that urban grids are mesh. To address these drawbacks, we propose a new data-driven approach to reconstruct distribution grid topology by utilizing the newly available smart meter data. Specifically, a graphical model is built to model the probabilistic relationships among different voltage measurements. With proof, the bus connectivity and topology estimation problems are formulated as a linear regression problem with least absolute shrinkage on grouped variables (Group Lasso) to deal with meshed network structures. Simulation results show highly accurate estimation in IEEE standard distribution test systems with and without loops using real smart meter data.


Predicting Car Prices Part 1: Linear Regression

@machinelearnbot

Let's walk through an example of predictive analytics using a data set that most people can relate to:prices of cars. In this case, we have a data set with historical Toyota Corolla prices along with related car attributes. Let's load in the Toyota Corolla file and check out the first 5 lines to see what the data set looks like: Price, Age, KM(kilometers driven), Fuel Type, HP(horsepower), Automatic or Manual, Number of Doors, and Weight in pounds are the data collected in this file for Toyota Corollas. In predictive models, there is a response variable(also called dependent variable), which is the variable that we are interested in predicting. The independent variables(the predictors also called features in the machine learning community) are one or more numeric variables we are using to predict the response variable.


Three Reasons Why Product Managers Need to Understand Machine Learning and How to Get Started

#artificialintelligence

Product Managers have enthusiastically adopted the data-driven approach to building products and have learnt not to rely solely on experience. For some features it is a continuous process that helps the Build-Measure-Learn iteration. Intuition backed by data is a product manager's most powerful weapon. If we have already made the shift towards data then why do we need Machine Learning, you ask? In this post, I am going to share why I believe every Product Manager should understand Machine Learning and where to start.


High-Dimensional $L_2$Boosting: Rate of Convergence

arXiv.org Machine Learning

Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by $L_2$Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-$L_2$Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical $L_2$Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of $L_2$Boosting. We also introduce feasible rules for early stopping, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-$L_2$Boosting clearly outperforms LASSO.


What are the differences between prediction, extrapolation, and interpolation?

@machinelearnbot

The former belongs to the realm of explanatory models, the latter to the realm of predictive analytics. Explanatory models, often involving linear regression, are concerned with explaining a given phenomenon and finding causal relationships between an output (dependent) variable, and a host, often very few, input (independent) variables. The objective is to find a good regression model that fits the data very well which meets the underlying assumption of linear regression. The emphasis here is on hypothesis testing, p-values, confidence intervals,…Once a good model is found, one can use it for estimating the value of the output variable for given values of the input variables. It is OK to estimate an output value based on interpolation, but one must use extreme caution in estimating output values based on extrapolation because the regression model is an explanatory model, not a predictive one.


Contextual Semibandits via Supervised Learning Oracles

arXiv.org Machine Learning

Decision making with partial feedback, motivated by applications including personalized medicine [22] and content recommendation [17], is receiving increasing attention from the machine learning community. These problems are formally modeled as learning from bandit feedback, where a learner repeatedly takes an action and observes a reward for the action, with the goal of maximizing reward. While bandit learning captures many problems of interest, several applications have additional structure: the action is combinatorial in nature and more detailed feedback is provided. For example, in internet applications, we often recommend sets of items and record information about the user's interaction with each individual item (e.g., click). This additional feedback is unhelpful unless it relates to the overall reward (e.g., number of clicks), and, as in previous work, we assume a linear relationship. This interaction is known as the semibandit feedback model. Typical bandit and semibandit algorithms achieve reward that is competitive with the single best fixed action, i.e., the best medical treatment or the most popular news article for everyone. This is often inadequate for recommendation applications: while the most popular articles may get some clicks, personalizing content to the users is much more effective.


Illustrated Guide to ROC and AUC

#artificialintelligence

Think of a regression model mapping a number of features onto a real number (potentially a probability). The resulting real number can then be mapped on one of two classes, depending on whether this predicted number is greater or lower than some choosable threshold. Let's take for example a logistic regression and data on the survivorship of the Titanic accident to introduce the relevant concepts which will lead naturally to the ROC (Receiver Operating Characteristic) and its AUC or AUROC (Area Under ROC Curve). Every record in the data set represents a passenger – providing information on her/his age, gender, class, number of siblings/spouses aboard (sibsp), number of parents/children aboard (parch) and, of course, whether s/he survived the accident. The logistic regression model is tested on batches of 10 cases with a model trained on the remaining N-10 cases – the test batches form a partition of the data. In short, Leave-10-out CV has been applied to arrive at more accurate estimation of the out-of-sample error rates.