Collaborating Authors

Modeling: Teaching a Machine Learning Algorithm to Deliver Business Value


This is the fourth in a four-part series on how we approach machine learning at Feature Labs. These articles cover the concepts and a full implementation as applied to predicting customer churn. The project Jupyter Notebooks are all available on GitHub. All of the work documented here was completed with open-source tools and data.) The Machine Learning Modeling ProcessThe outputs of prediction and feature engineering are a set of label times, historical examples of what we want to predict, and features, predictor variables used to train a model to predict the label.

A framework for feature engineering and machine learning pipelines


We refined this framework through experiments both at DataScience competitions and ManoMano (an European DIY & Gardening marketplace with 1M daily users). This post introduces two different core concepts at the same time, namely feature engineering (the process of transforming raw data to meaningful features to feed the desired algortihm) and machine learning pipeline (sequential data transformation workflow from data collection to prediction). It might seem to complicate understanding to speak about these two. However it is key to understand one while keeping in mind the other because they are heavily linked. They must be applied in coordination in order to make your project succeed.

The autofeat Python Library for Automatic Feature Engineering and Selection Machine Learning

This paper describes the autofeat Python library, which provides a scikit-learn style linear regression model with automatic feature engineering and selection capabilities. Complex non-linear machine learning models such as neural networks are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. Our library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.

Feature Engineering - Handling Cyclical Features


I was browsing twitter yesterday (follow me!) when I came across this tweet by Data Science Renee linking to this Medium article called "Top 6 Errors Novice Machine Learning Engineers Make" by Christopher Dossman. This drew my attention because I'm somewhat new to the field (and even if I weren't, it's always worth reviewing the fundamentals).

Deep Feature Synthesis: How Automated Feature Engineering Works


The artificial intelligence market is fueled by the potential to use data to change the world. While many organizations have already successfully adapted to this paradigm, applying machine learning to new problems is still challenging.