From Decision Trees and Random Forests to Gradient Boosting
Suppose we wish to perform supervised learning on a classification problem to determine if an incoming email is spam or not spam. The spam dataset consists of 4601 emails, each labelled as real (or not spam) (0) or spam (1). The data also contains a large number of predictors (57), each of which is either a character count, or a frequency of occurrence of a certain word or symbol. In this short article, we will briefly cover the main concepts in tree based classification and compare and contrast the most popular methods. This dataset and several worked examples are covered in detail in The Elements of Statistical Learning, II edition.
Nov-19-2020, 11:50:36 GMT
- Technology: