How I Predicted A Dataset Without Encoding Or Scaling Features
If you want a quick model that does not require you to separate the numerical columns from categorical ones, does not require you to ordinal encode or one hot encode them, and does not require you to standardise the independent variables? If your answer is yes then maybe you need to try CatBoost. CatBoost is an open source library, based on the concept of gradient boosting, which has been developed by the Russian company, Yandex. CatBoost is an especially powerful library because it yields state-of-the-art results without extensive data training typically required by other machine learning methods, and provides powerful out-of-the-box support for the more descriptive data formats that accompany many business problems. In order to show that CatBoost can make predictions on categorical data that has not been encoded and scaled, I selected a very popular dataset to experiment on: Kaggle's Ames House Price dataset, which forms part of one of their competitions on advanced regression, the link being found here:- House Prices -- Advanced Regression Techniques Kaggle "Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home. The Ames Housing dataset was compiled by Dean De Cock for use in data science education. It's an incredible alternative for data scientists looking for a modernized and expanded version of the often cited Boston Housing dataset."
Dec-13-2020, 01:07:44 GMT