Gini Impurity is named after the Italian statistician Corrado Gini. Gini impurity can be understood as a criterion to minimize the probability of misclassification. To understand the definition (as shown in the figure) and exactly how we can build up a decision tree, let's get started with a very simple data-set, where depending on various weather conditions, we decide whether to play an outdoor game or not. From the definition, a data-set containing only one class will have 0 Gini Impurity. In building up the decision tree our idea is to choose the feature with least Gini Impurity as root node and so on... Let's get started with the simple data-set -- Here we see that depending on 4 features (Outlook, Temperature, Humidity, Wind), decision is made on whether to play tennis or not.

Decision Trees are great and are useful for a variety of tasks. They form the backbone of most of the best performing models in the industry like XGboost and Lightgbm. But how do they work exactly? In fact, this is one of the most asked questions in ML/DS interviews. We generally know they work in a stepwise manner and have a tree structure where we split a node using some feature on some criterion.

Gatto, Joseph, Lanka, Ravi, Iwashita, Yumi, Stoica, Adrian

Have you ever wondered how your feature space is impacting the prediction of a specific sample in your dataset? In this paper, we introduce Single Sample Feature Importance (SSFI), which is an interpretable feature importance algorithm that allows for the identification of the most important features that contribute to the prediction of a single sample. When a dataset can be learned by a Random Forest classifier or regressor, SSFI shows how the Random Forest's prediction path can be utilized for low-level feature importance calculation. SSFI results in a relative ranking of features, highlighting those with the greatest impact on a data point's prediction. We demonstrate these results both numerically and visually on four different datasets.

Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. In simple terms, Gini impurity is the measure of impurity in a node. So to understand the formula a little better, let us talk specifically about the binary case where we have nodes with only two classes. So in the below five examples of candidate nodes labelled A-E and with the distribution of positive and negative class shown, which is the ideal condition to be in? I reckon you would say A or E and you are right.

We investigate how asymmetrizing an impurity function affects the choice of optimal node splits when growing a decision tree for binary classification. In particular, we relax the usual axioms of an impurity function and show how skewing an impurity function biases the optimal splits to isolate points of a particular class when splitting a node. We give a rigorous definition of this notion, then give a necessary and sufficient condition for such a bias to hold. We also show that the technique of class weighting is equivalent to applying a specific transformation to the impurity function, and tie all these notions together for a class of impurity functions that includes the entropy and Gini impurity. We also briefly discuss cost-insensitive impurity functions and give a characterization of such functions.

Imagine you are out to buy a cell phone for yourself. Shopkeeper asks,"How can I help you Ma'am?" "I am looking for a cell phone" "You are at the right place, we have over 300 different types of cell phones, what kind of phone would you like to buy today?" Decision paralysis hits you, totally confused among so many choices of phones you go blank! "Let me help you choose a phone ma'am. What screen size would you like?" "Umm… larger than 5.9 inches" "Perfect, and how about the camera?"

One challenge of neural or deep architectures is that it is difficult to determine what exactly is going on in the machine learning algorithm that makes a classifier decide how to classify inputs. This is a huge problem in deep learning: we can get fantastic classification accuracies, but we don't really know what criteria a classifier uses to make its classification decision. However, decision trees can present us with a graphical representation of how the classifier reaches its decision.