Goto

Collaborating Authors

 Decision Tree Learning


XGBoost Alternative Base Learners

#artificialintelligence

XGBoost, short for "Extreme Gradient Boosting," is one of the strongest machine learning algorithms for handling tabular data, a well-deserved reputation due to its success in winning numerous Kaggle competitions. XGBoost is an ensemble machine learning algorithm that usually consists of Decision Trees. The Decision Trees that make up XGBoost are individually referred to as gbtree, short for "gradient boosted tree." The first Decision Tree in the XGBoost ensemble is the base learner whose mistakes all subsequent trees learn from. Although Decision Trees are generally preferred as base learners due to their excellent ensemble scores, in some cases, alternative base learners may outperform them.


Analysis, Characterization, Prediction and Attribution of Extreme Atmospheric Events with Machine Learning: a Review

arXiv.org Artificial Intelligence

Atmospheric Extreme Events (EEs) cause severe damages to human societies and ecosystems. The frequency and intensity of EEs and other associated events are increasing in the current climate change and global warming risk. The accurate prediction, characterization, and attribution of atmospheric EEs is therefore a key research field, in which many groups are currently working by applying different methodologies and computational tools. Machine Learning (ML) methods have arisen in the last years as powerful techniques to tackle many of the problems related to atmospheric EEs. This paper reviews the ML algorithms applied to the analysis, characterization, prediction, and attribution of the most important atmospheric EEs. A summary of the most used ML techniques in this area, and a comprehensive critical review of literature related to ML in EEs, are provided. A number of examples is discussed and perspectives and outlooks on the field are drawn.


Decision Trees, Explained

#artificialintelligence

In this post we're going to discuss a commonly used machine learning model called decision tree. Decision trees are preferred for many applications, mainly due to their high explainability, but also due to the fact that they are relatively simple to set up and train, and the short time it takes to perform a prediction with a decision tree. Decision trees are natural to tabular data, and, in fact, they currently seem to outperform neural networks on that type of data (as opposed to images). Unlike neural networks, trees don't require input normalization, since their training is not based on gradient descent and they have very few parameters to optimize on. They can even train on data with missing values, but nowadays this practice is less recommended, and missing values are usually imputed.


How Random Forests & Decision Trees Decide: Simply Explained With An Example In Python

#artificialintelligence

Let's assume that we have a labeled dataset with 10 samples in total. What the Decision Trees do is simple: they find ways to split the data in a way such as that separate as much as possible the samples of the classes (increasing the class separability). In the above example, the perfect split would be a split at x 0.9 as this would lead to 5 red points being at the left side and the 5 blue at the right side (perfect class separability). Each time we split the space/data like that, we actually build a decision tree with a specific rule. Here we initially have the root node containing all the data and then, we split the data at x 0.9 leading to two branches leading to two leaf nodes.


Sequential Permutation Testing of Random Forest Variable Importance Measures

arXiv.org Artificial Intelligence

Hypothesis testing of random forest (RF) variable importance measures (VIMP) remains the subject of ongoing research. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. However, these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any type of RF and VIMP. Embracing this advantage, it is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests. The popular and widely used permutation VIMP serves as a practical and relevant application example. The results of simulation studies confirm that the theoretical properties of the sequential tests apply, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed in comparison to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation is provided through the accompanying R package $rfvimptest$. The approach can also be easily applied to any kind of prediction model.


Special Issue! Foundational Algorithms, Where They Came From, Where They're Going

#artificialintelligence

Years ago, I had to choose between a neural network and a decision tree learning algorithm. It was necessary to pick an efficient one, because we planned to apply the algorithm to a very large set of users on a limited compute budget. I went with a neural network. I hadn't used boosted decision trees in a while, and I thought they required more computation than they actually do -- so I made a bad call. Fortunately, my team quickly revised my decision, and the project was successful. This experience was a lesson in the importance of learning, and continually refreshing, foundational knowledge. If I had refreshed my familiarity with boosted trees, I would have made a better decision.


Machine Learning: Theory and Hands-on Practice with Python

#artificialintelligence

In the Machine Learning specialization, we will cover Supervised Learning, Unsupervised Learning, and the basics of Deep Learning. You will apply ML algorithms to real-world data, learn when to use which model and why, and improve the performance of your models. Starting with supervised learning, we will cover linear and logistic regression, KNN, Decision trees, ensembling methods such as Random Forest and Boosting, and kernel methods such as SVM. Then we turn our attention to unsupervised methods, including dimensionality reduction techniques (e.g., PCA), clustering, and recommender systems. We finish with an introduction to deep learning basics, including choosing model architectures, building/training neural networks with libraries like Keras, and hands-on examples of CNNs and RNNs.


Decision Tree Classification: Explain It To Me Like I'm 10

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. This is going to be part 4 of the Explaining Machine Learning Algorithms To A 10-Year Old series.


bsnsing: A decision tree induction method based on recursive optimal boolean rule composition

arXiv.org Machine Learning

This paper proposes a new mixed-integer programming (MIP) formulation to optimize split rule selection in the decision tree induction process, and develops an efficient search algorithm that is able to solve practical instances of the MIP model faster than commercial solvers. The formulation is novel for it directly maximizes the Gini reduction, an effective split selection criterion which has never been modeled in a mathematical program for its nonconvexity. The proposed approach differs from other optimal classification tree models in that it does not attempt to optimize the whole tree, therefore the flexibility of the recursive partitioning scheme is retained and the optimization model is more amenable. The approach is implemented in an open-source R package named bsnsing. Benchmarking experiments on 75 open data sets suggest that bsnsing trees are the most capable of discriminating new cases compared to trees trained by other decision tree codes including the rpart, C50, party and tree packages in R. Compared to other optimal decision tree packages, including DL8.5, OSDT, GOSDT and indirectly more, bsnsing stands out in its training speed, ease of use and broader applicability without losing in prediction accuracy.


Data on Machine Learning Described by Researchers at University of New South Wales (Learning from machines to close the gap between funding and expenditure in the Australian National Disability Insurance Scheme): Machine Learning

#artificialintelligence

By a News Reporter-Staff News Editor at Insurance Daily News -- New research on artificial intelligence is the subject of a new report. According to news reporting originating from Canberra, Australia, by NewsRx correspondents, research stated, "The Australian National Disability Insurance Scheme (NDIS) allocates funds to participants for purchase of services." Our news reporters obtained a quote from the research from University of New South Wales: "Only one percent of the 89,299 participants spent all of their allocated funds with 85 participants having failed to spend any, meaning that most of the participants were left with unspent funds. The gap between the allocated budget and realised expenditure reflects misallocation of funds. Thus we employ alternative machine learning techniques to estimate budget and close the gap while maintaining the aggregate level of spending. Three experiments are conducted to test the machine learning models in estimating the budget, expenditure and the resulting gap; compare the learning rate between machines and humans; and identify the significant explanatory variables."