Good Features are the backbone of any machine learning model. And good feature creation often needs domain knowledge, creativity, and lots of time. TLDR; this post is about useful feature engineering methods and tricks that I have learned and end up using often. Have you read about featuretools yet? If not, then you are going to be delighted.
Good Features are the backbone of any machine learning model. And good feature creation often needs domain knowledge, creativity, and lots of time. And some other ideas to think about feature creation. TLDR; this post is about useful feature engineering methods and tricks that I have learned and end up using often. Have you read about featuretools yet? If not, then you are going to be delighted.
This post is by Gal Oshri, a Program Manager in the Data Group at Microsoft. RFM is a simple and intuitive technique for segmenting customers and has been used by marketers for decades. RFM also has surprising value in machine learning applications despite its simplicity. This blog post describes how a generic technique has allowed us to come within 1% accuracy of winning solutions in various ML competitions, such as placing in the top 30 entries of the KDD Cup 2015 and getting a boost of 502 positions on the leaderboard of an AirBnB Kaggle competition. RFM has been widely used in direct marketing and database marketing for identifying the customers who are most likely to respond or make a purchase .
Malware recognition modules decide if an object is a threat, based on the data they have collected on it. This data may be collected at different phases: – Pre-execution phase data is anything you can tell about a file without executing it. This may include executable file format descriptions, code descriptions, binary data statistics, text strings and information extracted via code emulation and other similar data. In the early epochs of the cyber era, the number of malware threats was relatively low, and simple handcrafted pre-execution rules were often enough to detect threats. But a decade ago, the tremendous growth of the malware stream did not allow anti-malware solutions to rely solely on the expensive manual creation of detection rules.
Time series prediction problems are pretty frequent in the retail domain. Companies like Walmart and Target need to keep track of how much product should be shipped from Distribution Centres to stores. Even a small improvement in such a demand forecasting system can help save a lot of dollars in term of workforce management, inventory cost and out of stock loss. While there are many techniques to solve this particular problem like ARIMA, Prophet, and LSTMs, we can also treat such a problem as a regression problem too and use trees to solve it. In this post, we will try to solve the time series problem using XGBoost.