Feature Engineering for Machine Learning
"Good features allow a simple model to beat a complex model" We'll see there's an almost infinite number of ways to build new features from existing ones, so the art in Feature Generation, once you're aware of the basic techniques described below, is really in gaining the intuition on what to try. For this article, we'll be jointly describing both Feature Extraction, which generally refers to domain-specific methods of dimensionality reduction, as well as Feature Generation, accomplished via i. mapping existing features into a new space, ii. We'll be grouping methods by their applicability to the underlying data type. The periodicity may manifest at more than one time-scale so, depending on your data, you may wish to decompose a timestamp column into multiple columns, such as: Minutes, Hour, Day of week, Weekday-or-Weekend, Day of Month, Month, Season or Year. Doing so will also let you use pd.DataFrame.groupby() to perform aggregations, which is in itself one of the most powerful ways to generate new features.
Mar-14-2022, 14:44:04 GMT