Goto

Collaborating Authors

Automatic Dataset Normalization for Feature Engineering in Python

#artificialintelligence

A normalized, relational dataset makes it easier to perform feature engineering. Unfortunately, raw data for machine learning is often stored as a single table, which makes the normalization process tedious and time-consuming. Well, I am happy to introduce you to AutoNormalize, an open-source library that automates the normalization process and integrates seamlessly with Featuretools, another open-source library for automated feature engineering. The normalized dataset can then be returned as either an EntitySet or a collection of DataFrames. Using AutoNormalize makes it easier to get started with Featuretools and can help provide you with a quick preview of what Featuretools is capable of.


The Hitchhiker's Guide to Feature Extraction

#artificialintelligence

Good Features are the backbone of any machine learning model. And good feature creation often needs domain knowledge, creativity, and lots of time. And some other ideas to think about feature creation. TLDR; this post is about useful feature engineering methods and tricks that I have learned and end up using often. Have you read about featuretools yet? If not, then you are going to be delighted.


The Hitchhiker's Guide to Feature Extraction

#artificialintelligence

Good Features are the backbone of any machine learning model. And good feature creation often needs domain knowledge, creativity, and lots of time. TLDR; this post is about useful feature engineering methods and tricks that I have learned and end up using often. Have you read about featuretools yet? If not, then you are going to be delighted.


Machine learning 2.0 : Engineering Data Driven AI Products

arXiv.org Artificial Intelligence

ML 2.0: In this paper, we propose a paradigm shift from the current practice of creating machine learning models - which requires months-long discovery, exploration and "feasibility report" generation, followed by re-engineering for deployment - in favor of a rapid, 8-week process of development, understanding, validation and deployment that can executed by developers or subject matter experts (non-ML experts) using reusable APIs. This accomplishes what we call a "minimum viable data-driven model," delivering a ready-to-use machine learning model for problems that haven't been solved before using machine learning. We provide provisions for the refinement and adaptation of the "model," with strict enforcement and adherence to both the scaffolding/abstractions and the process. We imagine that this will bring forth the second phase in machine learning, in which discovery is subsumed by more targeted goals of delivery and impact.


Feature Engineering: What Powers Machine Learning

#artificialintelligence

Each feature is a combination of feature primitives and is built with only data from before the associated cutoff time. The features built by Featuretools are explainable in natural language because they are built up from basic operations.