Goto

Collaborating Authors

 scikit learn library


Why You Shouldn't Use pandas.get_dummies For Machine Learning

#artificialintelligence

The Pandas library is well known for its utility in machine learning projects. However, there are some tools in Pandas that just aren't ideal for training models. One of the best examples of such a tool is the get_dummies function, which is used for one hot encoding. Here, we provide a quick rundown of the one hot encoding feature in Pandas and explain why it isn't suited for machine learning tasks. Let's start with a quick refresher on how to one hot encode variables with Pandas.


Linear Regression in Python for Data Scientists

#artificialintelligence

Linear Regression is a statistical method used for modelling the dependence or relationship between two or more quantities. The aim of this is to be able to either better understand the existing relationships or to be able to predict the behaviour at points for which we currently don't have data. By using the method of linear regression (also called least squares fitting), we can calculate the values for the two parameters and plot the line of best fit to achieve our aims of better understanding the relationship or finding the estimated values of unknown points. For this, we have to be able to calculate the slope (m) and intercept (c) to give us the line of best fit for the data. This is made simple however by libraries that have already been implemented such as Scikit-Learn and Statsmodels Api that have linear regression functionality built in.