I graduated on Warsaw University of Technology with master thesis about text mining topic (intelligent web crawling methods). I work for Polish IT consulting company (Sollers Consulting), where I develop and design various insurance industry related stuff, (one of them is insurance fraud detection platform). From time to time I try to compete in data mining contests (Netflix, competitions on Kaggle and tunedit.org) As far as I remember, the basis of the solution I defined at the very beginning: to create separate predictors for each individual loop and time interval. So my solution required me to build 61x10 610 regression models.
When I started reading articles on neural networks, I faced a lot of struggles to understand the basics behind neural networks and how they work. Start reading more and more articles on the internet, grab those key points, and put them together into private notes for me. And, I thought to publish them for better understandings to others. It would be fun to know the basics of any domain. The perceptron is one of the simplest ANN Architectures, invented in 1957 by Frank Rosenblatt.
Machine learning and statistics have many applications in business and the social sciences. However, the theory is often intimidating and not easily understood. In this series of articles, I aim to demystify the concepts behind the common tools used in data science and machine learning, starting with linear regression. Linear regression is a statistical method that allows us to describe relationships between variables (distinct things that can be measured or recorded, such as height, weight, and hair colour). It is an extension of the General Linear Model, a framework to describe how a variable of interest can be modelled using other predictor variables. In simple linear regression (SLR), we focus on the relationship between two continuous variables, x and y (hence, simple).
We have all built a logistic regression at some point in our lives. Even if we have never built a model, we have definitely learned this predictive model technique theoretically. Two simple, undervalued concepts used in the preprocessing step to build a logistic regression model are the weight of evidence and information value. I would like to bring them back to the limelight through this article. First thing first, we all know logistic regression is a classification problem.
In this hands-on project, we will train a Linear Regression model to predict life expectancy. The dataset was initially obtained from the World Health Organization (WHO) and United Nations Websites. Data contains features such as year, status, life expectancy, adult mortality, infant deaths, percentage of expenditure, and alcohol consumption.
What is the need for Ridge and Lasso Regression? When we create our linear model with the best-fitted line and come on testing phase then because of increased variation, our model is over-fitted, So It will not work well in the future also not provide appropriate accuracy. Therefore, to reduce overfitting, ridge and lasso regression came into the picture. Both are powerful techniques with a slight difference used for creating such models that are efficient and computationally fit to reduce over-fitting. It is a process to classify the classes and provide additional information to prevent over-fitting.
I'm almost certain that now you might want to learn about these branches in greater detail. Worry not, I'll surely open the gates to these subsets in the posts to come. If you missed my post, you can find it at the following link: Branches of Artificial Intelligence. Previously, we discussed Machine Learning. We also discussed its subsets -- Supervised Learning, Unsupervised Learning, and Reinforcement Learning.
So what machine learning model are we building today? In this article, we are going to be building a regression model using the random forest algorithm on the solubility dataset. After model building, we are going to apply the model to make predictions followed by model performance evaluation and data visualization of its results. So which dataset are we going to use? The default answer may be to use a toy dataset as an example such as the Iris dataset (classification) or the Boston housing dataset (regression).
In this new series, I took it upon myself to improve my coding skills and habits by writing clean, reusable, well-documented code with test cases. This is the first part of the series where I implement Linear, Polynomial, Ridge, Lasso, and ElasticNet Regression from scratch in an object-oriented manner. We'll start with a simple LinearRegression class and then build upon it creating an entire module of linear models in a simple style similar to Scikit-Learn. My implementations are in no way optimal solutions and are only meant to increase our understanding of machine learning. In the repository you will find all of the code found in this blog and more including test cases for every class and function.
Linear Algebra is a branch of mathematics that is extremely useful in data science and machine learning. Linear algebra is the most important math skill in machine learning. Most machine learning models can be expressed in matrix form. A dataset itself is often represented as a matrix. Linear algebra is used in data preprocessing, data transformation, and model evaluation.