Feature Engineering is one of those terms that, on the surface, seems to mean exactly what it is saying: you want to refactor or create something from the data that you have. Okay, fine…but what does that actually mean in real life when you're sitting in front of your data set and wondering what to do? The term encompasses a variety of methods that each have a variety of sub-methods associated with them. I'm just going to cover some of the main ones to give you an idea of the sort of thing Feature Engineering contains, with some indication of widely used methods. Encoding -- I think this is one of the most simple and commonly used aspects of Feature Engineering.
Learning with streaming data has attracted much attention during the past few years.Though most studies consider data stream with fixed features, in real practice the features may be evolvable. For example, features of data gathered by limited lifespan sensors will change when these sensors are substituted by new ones. In this paper, we propose a novel learning paradigm: Feature Evolvable Streaming Learning where old features would vanish and new features would occur. Rather than relying on only the current features, we attempt to recover the vanished features and exploit it to improve performance. Specifically, we learn two models from the recovered features and the current features, respectively.
Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features. This is the most comprehensive, yet easy to follow, course for feature selection available online. Throughout this course you will learn a variety of techniques used worldwide for variable selection, gathered from data competition websites and white papers, blogs and forums, and from the instructor's experience as a Data Scientist. You will have at your fingertips, altogether in one place, multiple methods that you can apply to select features from your data set.
The best results come down to you, the practitioner, crafting the features. Feature importance and selection can inform you about the objective utility of features, but those features have to come from somewhere. You need to manually create them. This requires spending a lot of time with actual sample data (not aggregates) and thinking about the underlying form of the problem, structures in the data and how best to expose them to predictive modeling algorithms. With tabular data, it often means a mixture of aggregating or combining features to create new features, and decomposing or splitting features to create new features.
Machine learning techniques have been widely applied in Internet companies for various tasks, acting as an essential driving force, and feature engineering has been generally recognized as a crucial tache when constructing machine learning systems. Recently, a growing effort has been made to the development of automatic feature engineering methods, so that the substantial and tedious manual effort can be liberated. However, for industrial tasks, the efficiency and scalability of these methods are still far from satisfactory. In this paper, we proposed a staged method named SAFE (Scalable Automatic Feature Engineering), which can provide excellent efficiency and scalability, along with requisite interpretability and promising performance. Extensive experiments are conducted and the results show that the proposed method can provide prominent efficiency and competitive effectiveness when comparing with other methods. What's more, the adequate scalability of the proposed method ensures it to be deployed in large scale industrial tasks.