Open Machine Learning Course. Topic 6. Feature Engineering and Feature Selection
In this course, we have already seen several key machine learning algorithms. However, before moving on to the more fancy ones, we'd like to take a small detour and talk about data preparation. The well-known concept of "garbage in -- garbage out" applies 100% to any task in machine learning. Any experienced professional can recall numerous times when a simple model trained on high-quality data was proven to be better than a complicated multi-model ensemble built on data that wasn't clean. This article will contain almost no math, but there will be a fair amount of code. Some examples will use the dataset from Renthop company, which is used in the Two Sigma Connect: Rental Listing Inquiries Kaggle competition. In this task, you need to predict the popularity of a new rental listing, i.e. classify the listing into three classes: ['low', 'medium', 'high']. To evaluate the solutions, we will use the log loss metric (the smaller, the better).
Apr-21-2018, 00:26:24 GMT