Goto

Collaborating Authors

 play cricket


How to select Best Split in Decision Trees using Chi-Square

#artificialintelligence

Let's see how we can calculate the expected values. If you recall this is how the split on "Performance in class" looks like- There is a total of 20 students and out of those 10 play cricket and 10 do not. So, of course, the percent of students who do play cricket will be 50%. Now if we consider the "Above average" node here, there are 14 students in it, as the percentage of students who play cricket is 50% in the parent node as we discussed, the expected number of students who play cricket will of course be 7 and if you look at the actual value it is 8. So now we have both the values expected values and actual values.


Decision Trees for Classification: A Machine Learning Algorithm

#artificialintelligence

It does not require any statistical knowledge to read and interpret them. Its graphical representation is very intuitive and users can easily relate their hypothesis. Useful in Data exploration: Decision tree is one of the fastest way to identify most significant variables and relation between two or more variables. With the help of decision trees, we can identify features that have better power to predict target variable. For example, we are working on a problem where we have information available in hundreds of variables, there decision tree will help to identify most significant variable. Less data cleaning required: It requires less data cleaning compared to some other modeling techniques. It is not influenced by outliers and missing values to a fair degree. Data type is not a constraint: It can handle both numerical and categorical variables.