Goto

Collaborating Authors

 Regression


Going Deeper into Regression Analysis with Assumptions, Plots & Solutions

@machinelearnbot

This article on going deeper into regression analysis with assumptions, plots & solutions, was posted by Manish Saraswat. Manish who works in marketing and Data Science at Analytics Vidhya believes that education can change this world. R, Data Science and Machine Learning keep him busy. Regression analysis marks the first step in predictive modeling. No doubt, it's fairly easy to implement.


[P] Linear Regression with Python โ€ข /r/MachineLearning

#artificialintelligence

You can solve that optimisation problem using both gradient descent and as a closed-form solution, although for really big datasets it is preferable to use the former since calculating the inverse would be really computationally costly.


Logistic Regression using python

@machinelearnbot

This article was posted by Arpan Gupta (Indian Institute of Technology). Let's learn from a precise demo on Fitting Logistic Regression on Titanic Data Set for Machine Learning Description:On April 15, 1912, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This tragedy has led to better safety regulations for ships.


How seasonal components can be represented as sinusoids in a regression model.

@machinelearnbot

There was a verbal solution given to this problem in the members only section. I'm not sure if its "legal" to share the whole thing, but here is an excerpt of the solution. " The time series has a weekly periodicity with two peaks: Monday and Thursday, corresponding respectively to the publication of the Monday and Thursday digests. The impact of the Monday and Thursday email blasts extent over the next day; this makes measuring the yield more difficult, unless you use additional data, e.g. from our newsletter vendor. However, the bulk of the impact is really on Monday and Thursday."


Linear Regression with Python

#artificialintelligence

Let's start with a simple problem, we suppose that we have a small dataset with house prices for a specific area in a city, the database contains two fields, the size of the house and its price (SIZE, PRICE), and I would like to know the price of a house with a specific size, the problem is that I don't have that size in my dataset, what should I do? We already know from the title that the solution is linear regression, but to explain more easier, I've a collected a little dataset that contains house prices, in the table below a snippet from the dataset: Visualization helps us a lot in identifying patterns in data, that's way to have a better view to our dataset, I m going to plot it using matplotlib python library: From the plotting we can see that the price grows with the size, but the points don't make a prefect line that can help us predict the price of a new size, so we need to find a linear function h(x) that passes next to all the points but not necessary over them, we call the function the hypothesis: In the equation 2, m is the size of our dataset, Xi is the ith price and Yi is the ith size in the dataset, we call J the error function (or the objective function) that we need to minimize. There are other error functions or estimators in statistics that we can use, but in our case we'll use the MSE or the mean squared error estimator, because it will help us find our unknowns parameters more easier, our function will become: The estimator J takes two arguments, which means it's a 3D function, the figure 3 shows how the function looks like in a 3D graph, our goal here is to find the minimum value, which is the lowest point in the graph below, imagine putting a ball inside the graph, the ball will slide into the bottom of the shape. To find the lowest point in the shape, or in another word minimizing the objective function, we'll use the gradient descent algorithm, which is very simple to understand. To reach the bottom of the shape, we will choose randomly a point in the graph, that's mean setting ฮธ0 and ฮธ1 to a random value, at that point we need to decide, do we need to go up or down?


Bayesian Learning of Consumer Preferences for Residential Demand Response

arXiv.org Machine Learning

In coming years residential consumers will face real-time electricity tariffs with energy prices varying day to day, and effective energy saving will require automation - a recommender system, which learns consumer's preferences from her actions. A consumer chooses a scenario of home appliance use to balance her comfort level and the energy bill. We propose a Bayesian learning algorithm to estimate the comfort level function from the history of appliance use. In numeric experiments with datasets generated from a simulation model of a consumer interacting with small home appliances the algorithm outperforms popular regression analysis tools. Our approach can be extended to control an air heating and conditioning system, which is responsible for up to half of a household's energy bill.


Modelling Competitive Sports: Bradley-Terry-\'{E}l\H{o} Models for Supervised and On-Line Learning of Paired Competition Outcomes

arXiv.org Machine Learning

Prediction and modelling of competitive sports outcomes has received much recent attention, especially from the Bayesian statistics and machine learning communities. In the real world setting of outcome prediction, the seminal \'{E}l\H{o} update still remains, after more than 50 years, a valuable baseline which is difficult to improve upon, though in its original form it is a heuristic and not a proper statistical "model". Mathematically, the \'{E}l\H{o} rating system is very closely related to the Bradley-Terry models, which are usually used in an explanatory fashion rather than in a predictive supervised or on-line learning setting. Exploiting this close link between these two model classes and some newly observed similarities, we propose a new supervised learning framework with close similarities to logistic regression, low-rank matrix completion and neural networks. Building on it, we formulate a class of structured log-odds models, unifying the desirable properties found in the above: supervised probabilistic prediction of scores and wins/draws/losses, batch/epoch and on-line learning, as well as the possibility to incorporate features in the prediction, without having to sacrifice simplicity, parsimony of the Bradley-Terry models, or computational efficiency of \'{E}l\H{o}'s original approach. We validate the structured log-odds modelling approach in synthetic experiments and English Premier League outcomes, where the added expressivity yields the best predictions reported in the state-of-art, close to the quality of contemporary betting odds.


Subset Selection for Multiple Linear Regression via Optimization

arXiv.org Machine Learning

The regression analysis is a statistical methodology for predicting values of response (dependent) variables from a set of explanatory (independent) variables by investigating the relationships among the variables. The regression analysis is used for forecasting and prediction in a variety of areas, from economics to biology. When the relationship among the variables is expressed as a linear equation and the set of explanatory variables has more than one variable, it is termed multiple linear regression. The multiple linear regression model is the most popular model among the various variants of regression analyses. Given a fixed set of explanatory variables, the goal of the multiple linear regression is to find the coefficients for the explanatory variables that minimize the fitting error.


A Kaggle Master Explains Gradient Boosting

#artificialintelligence

This tutorial was originally posted here on Ben's blog, GormAnalysis. If linear regression was a Toyota Camry, then gradient boosting would be a UH-60 Blackhawk Helicopter. A particular implementation of gradient boosting, XGBoost, is consistently used to win machine learning competitions on Kaggle. It's also been butchered to death by a host of drive-by data scientists' blogs. As such, the purpose of this article is to lay the groundwork for classical gradient boosting, intuitively and comprehensively.


R Tutorial with Bayesian Statistics Using OpenBUGS

#artificialintelligence

This text provides R tutorials on statistics including hypothesis testing, ANOVA and linear regressions. It fulfills popular demands by users of r-tutor.com for exercise solutions and offline access. Part III of the text is about Bayesian statistics. It begins with closed analytic solutions and basic BUGS models for simple examples. Then it covers OpenBUGS for Bayesian ANOVA and regression analysis.