Mistakes in Applying Univariate Feature Selection Methods
In Python scikit-learn library, there are various univariate feature selection methods such as Regression F-score, ANOVA and Chi-squared. Perhaps due to the ease of applying these methods (sometimes with just a single line of code), it might be tempting to just use these methods without taking into consideration the type of features you have. I have seen some machine learning practitioners took this for granted and made this mistake (including myself). While the scikit-learn documentation is clear on which feature selection method should be used for regression and classification, it does not specify whether these methods are suitable to apply to both continuous and categorical features. Let's say you have a classification task and after reading the documentation, you know you should use either Chi-squared test or ANOVA.
Nov-20-2020, 15:41:06 GMT