Gini Impurity is named after the Italian statistician Corrado Gini. Gini impurity can be understood as a criterion to minimize the probability of misclassification. To understand the definition (as shown in the figure) and exactly how we can build up a decision tree, let's get started with a very simple data-set, where depending on various weather conditions, we decide whether to play an outdoor game or not. From the definition, a data-set containing only one class will have 0 Gini Impurity. In building up the decision tree our idea is to choose the feature with least Gini Impurity as root node and so on... Let's get started with the simple data-set -- Here we see that depending on 4 features (Outlook, Temperature, Humidity, Wind), decision is made on whether to play tennis or not.
Jan-15-2020, 09:07:15 GMT