The artificial intelligence market is fueled by the potential to use data to change the world. While many organizations have already successfully adapted to this paradigm, applying machine learning to new problems is still challenging. The single biggest technical hurdle that machine learning algorithms must overcome is their need for processed data in order to work -- they can only make predictions from numeric data. This data is composed of relevant variables, known as "features." The process for extracting these numeric features is called "feature engineering."
This is the fourth in a four-part series on how we approach machine learning at Feature Labs. These articles cover the concepts and a full implementation as applied to predicting customer churn. The project Jupyter Notebooks are all available on GitHub. All of the work documented here was completed with open-source tools and data.) The Machine Learning Modeling ProcessThe outputs of prediction and feature engineering are a set of label times, historical examples of what we want to predict, and features, predictor variables used to train a model to predict the label.
Anyone who has participated in machine learning hackathons and competitions can attest to how crucial feature engineering can be. It is often the difference between getting into the top 10 of the leaderboard and finishing outside the top 50! I have been a huge advocate of feature engineering ever since I realized it's immense potential. But it can be a slow and arduous process when done manually. I have to spend time brainstorming over what features to come up, and analyze their usability them from different angles.