Dealing with sparse categorical variables in predictive modeling

#artificialintelligence 

One of the biggest challenges a data scientist must deal with is to find an efficient way to numerically encode qualitative features. Indeed, only numerical representation of categorical variables can be used as input of predictive models. The most known method is called one-hot encoding, and it works by creating dummy variables. Hence, if a qualitative column has n modalities, n columns will be added to the dataset. Even if most of the time dummy encoding is an effective and flexible way to reach a good performance, there are situations in which it would be needed to explore other methods, like "Frequency encoding" and "Target encoding".

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found