Feature Engineering for Machine Learning: A Comprehensive Overview
Feature engineering is the process of using domain knowledge of the data to transform existing features or to create new variables from existing ones, for use in machine learning. Data in its raw format is almost never suitable for use to train machine learning algorithms. Instead, data scientists devote a substantial amount of time to pre-process the variables to use them in machine learning. As you can see, feature engineering is an umbrella term that includes multiple techniques to perform everything from filling missing values, to encoding categorical variables, to variable transformation, to creating new variables from existing ones. In this post, I highlight the main feature engineering techniques to process the data and leave it ready to use for machine learning. I describe what each technique entails, and say a few words about when we should use each technique.