Goto

Collaborating Authors

 Regression


Logistic Regression Vs Decision Trees Vs SVM: Part I

@machinelearnbot

Classification is one of the major problems that we solve while working on standard business problems across industries. In this article we'll be discussing the major three of the many techniques used for the same, Logistic Regression, Decision Trees and Support Vector Machines [SVM]. All of the above listed algorithms are used in classification [ SVM and Decision Trees are also used for regression, but we are not discussing that today!]. Time and again I have seen people asking which one to choose for their particular problem. Classical and the most correct but least satisfying response to that question is "it depends!".


Evaluating the Performance of Offensive Linemen in the NFL

arXiv.org Machine Learning

How does one objectively measure the performance of an individual offensive lineman in the NFL? The existing literature proposes various measures that rely on subjective assessments of game film, but has yet to develop an objective methodology to evaluate performance. Using a variety of statistics related to an offensive lineman's performance, we develop a framework to objectively analyze the overall performance of an individual offensive lineman and determine specific linemen who are overvalued or undervalued relative to their salary. We identify eight players across the 2013-2014 and 2014-2015 NFL seasons that are considered to be overvalued or undervalued and corroborate the results with existing metrics that are based on subjective evaluation. To the best of our knowledge, the techniques set forth in this work have not been utilized in previous works to evaluate the performance of NFL players at any position, including offensive linemen.


Beginners Guide: Apache Spark Machine Learning with Large Data

#artificialintelligence

This informative tutorial walks us through using Spark's machine learning capabilities and Scala to train a logistic regression classifier on a larger-than-memory dataset.


Machine Learning Methods: Classification without negative examples โ€“ EFavDB

#artificialintelligence

Here, we discuss some methods for carrying out classification when only positive examples are available. The latter half of our discussion borrows heavily from W.S. Lee and B. Liu, Proc. Follow @efavdb Follow us on twitter for new submission alerts! Logistic regression is a commonly used tool for estimating the level sets of a Boolean function y on a set of feature vectors \textbf{F}: In a sense, you can think of it as a method for playing the game "Battleship" on whatever data set you're interested in. Consider now a situation where all training examples given are positive -- i.e., no negative examples are available.


Automated Linear Regression for Really, Really Big Data

@machinelearnbot

Inora is not claiming that the free version of the RAE Linear Regression software can solve complex data challenges. The linear regression is just an application of the generalized core Math Engine to that specific task (y mx b). Of course, the free version is limited to a 2-class task (finding what is line and what is not line). The Inora Math Engine is also capable of detecting and analyzing multiple patterns in any size data set. So, even though the free linear regression software is limited, it will demonstrate how the Math Engine core is different from traditional statistics, random sample and Least Squares approaches.


Learn the Concept of linearity in Regression Models

@machinelearnbot

This Tutorial talks about basics of Linear regression by discussing in depth about the concept of Linearity and Which type of linearity is desirable. Linear regression however always means linearity in parameters, irrespective of linearity in explanatory variables. Here the variable X can be non linear i.e X or Xยฒ and still we can consider this as a linear regression. However if our parameters are not linear i.e say the regression equation is A function Y f(x) is said to be linear in X if X appears with a power or index of 1 only. Y is linearly related to X if the rate of change of Y with respect to X (dY/dX) is independent of the value of X. B2 is Linear but B1 is non-linear but if we transform?


Predicting Flights Delay Using Supervised Learning, Logistic Regression

@machinelearnbot

In this post, we'll use a supervised machine learning technique called logistic regression to predict delayed flights. But before we proceed, I like to give condolences to the family of the the victims of the Germanwings tragedy. Note: This is a common data set in the machine learning community to test out algorithms and models given it's publicly available and have sizable data. In this blog, we will look at small sample snapsot(2201 flights in January 2004). In another post, we can explore using Big Data technologies such as Hadoop MapReduce or Spark machine learning libraries to do large scale predictive analytics and data mining.


Linear regression on an usual domain, hyperplane, sphere or simplex

@machinelearnbot

I was wondering if you are aware of any methodology to perform multivariate linear regression on non-standard spaces or domains. I try to reverse-engineer the recipe for the coca cola beverage. The response, Y, is how close my recipe is to the actual formula, based on a number of tastings performed by a number of different people, according to a design of experiment plan. Indeed, it's quite similar to a clinical trial where a mix of atoms or chemical radicals (each combination producing a unique molecule) is tested to optimize a drug. The independent variables are binary, each one representing an ingredient: salt, water, corn syrup etc.


The battle between optimization and curve-fitting

#artificialintelligence

In a recent interview with Managing Editor Dan Collins, legendary trader William Eckhardt talked about the battle between optimization and curve fitting. While you always hope your great idea will lead to a winning system, how you select data to test that system can equally lead to success or failure. The excerpt below is from William Eckhardt: The man who launched 1,000 systems. Bill Eckhardt: By trying to improve your system you can make it worse. You can over-fit to past data or maybe just do something that is statistically invalid.


Regression, Logistic Regression and Maximum Entropy

#artificialintelligence

One of the most important tasks in Machine Learning are the Classification tasks (a.k.a. Classification is used to make an accurate prediction of the class of entries in the test set (a dataset of which the entries have not been labelled yet) with the model which was constructed from a training set. You could think of classifying crime in the field of Pre-Policing, classifying patients in the Health sector, classifying houses in the Real-Estate sector. Another field in which classification is big, is Natural Lanuage Processing (NLP). This is the field of science with the goal to makes machines (computers) understand (written) human language.