Decision Tree Learning
Mo\"ET: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees
Vasic, Marko, Petrovic, Andrija, Wang, Kaiyuan, Nikolic, Mladen, Singh, Rishabh, Khurshid, Sarfraz
Deep Reinforcement Learning (DRL) has led to many recent breakthroughs on complex control tasks, such as defeating the best human player in the game of Go. However, decisions made by the DRL agent are not explainable, hindering its applicability in safety-critical settings. Viper, a recently proposed technique, constructs a decision tree policy by mimicking the DRL agent. Decision trees are interpretable as each action made can be traced back to the decision rule path that lead to it. However, one global decision tree approximating the DRL policy has significant limitations with respect to the geometry of decision boundaries. We propose Mo\"ET, a more expressive, yet still interpretable model based on Mixture of Experts, consisting of a gating function that partitions the state space, and multiple decision tree experts that specialize on different partitions. We propose a training procedure to support non-differentiable decision tree experts and integrate it into imitation learning procedure of Viper. We evaluate our algorithm on four OpenAI gym environments, and show that the policy constructed in such a way is more performant and better mimics the DRL agent by lowering mispredictions and increasing the reward. We also show that Mo\"ET policies are amenable for verification using off-the-shelf automated theorem provers such as Z3.
Extracting Interpretable Concept-Based Decision Trees from CNNs
Chyung, Conner, Tsang, Michael, Liu, Yan
In an attempt to gather a deeper understanding of how convolutional neural networks (CNNs) reason about human-understandable concepts, we present a method to infer labeled concept data from hidden layer activations and interpret the concepts through a shallow decision tree. The decision tree can provide information about which concepts a model deems important, as well as provide an understanding of how the concepts interact with each other. Experiments demonstrate that the extracted decision tree is capable of accurately representing the original CNN's classifications at low tree depths, thus encouraging human-in-the-loop understanding of discriminative concepts.
Early Detection of Depression: Social Network Analysis and Random Forest Techniques
Background: Major depressive disorder (MDD) or depression is among the most prevalent psychiatric disorders, affecting more than 300 million people globally. Early detection is critical for rapid intervention, which can potentially reduce the escalation of the disorder. Objective: This study used data from social media networks to explore various methods of early detection of MDDs based on machine learning. We performed a thorough analysis of the dataset to characterize the subjects' behavior based on different aspects of their writings: textual spreading, time gap, and time span. Methods: We proposed 2 different approaches based on machine learning singleton and dual.
Robustness Verification of Tree-based Models
Chen, Hongge, Zhang, Huan, Si, Si, Li, Yang, Boning, Duane, Hsieh, Cho-Jui
We study the robustness verification problem for tree-based models, including decision trees, random forests (RFs) and gradient boosted decision trees (GBDTs). Formal robustness verification of decision tree ensembles involves finding the exact minimal adversarial perturbation or a guaranteed lower bound of it. Existing approaches find the minimal adversarial perturbation by a mixed integer linear programming (MILP) problem, which takes exponential time so is impractical for large ensembles. Although this verification problem is NP-complete in general, we give a more precise complexity characterization. We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles, the verification problem can be cast as a max-clique problem on a multi-partite graph with bounded boxicity. For low dimensional problems when boxicity can be viewed as constant, this reformulation leads to a polynomial time algorithm. For general problems, by exploiting the boxicity of the graph, we develop an efficient multi-level verification algorithm that can give tight lower bounds on the robustness of decision tree ensembles, while allowing iterative improvement and any-time termination. OnRF/GBDT models trained on 10 datasets, our algorithm is hundreds of times faster than the previous approach that requires solving MILPs, and is able to give tight robustness verification bounds on large GBDTs with hundreds of deep trees.
Random Tessellation Forests
Ge, Shufei, Wang, Shijia, Teh, Yee Whye, Wang, Liangliang, Elliott, Lloyd T.
Space partitioning methods such as random forests and the Mondrian process are powerful machine learning methods for multi-dimensional and relational data, and are based on recursively cutting a domain. The flexibility of these methods is often limited by the requirement that the cuts be axis aligned. The Ostomachion process and the self-consistent binary space partitioning-tree process were recently introduced as generalizations of the Mondrian process for space partitioning with non-axis aligned cuts in the two dimensional plane. Motivated by the need for a multi-dimensional partitioning tree with non-axis aligned cuts, we propose the Random Tessellation Process (RTP), a framework that includes the Mondrian process and the binary space partitioning-tree process as special cases. We derive a sequential Monte Carlo algorithm for inference, and provide random forest methods. Our process is self-consistent and can relax axis-aligned constraints, allowing complex inter-dimensional dependence to be captured. We present a simulation study, and analyse gene expression data of brain tissue, showing improved accuracies over other methods.
A meta-learning recommender system for hyperparameter tuning: predicting when tuning improves SVM classifiers
Mantovani, Rafael Gomes, Rossi, Andrรฉ Luis Debiaso, Alcobaรงa, Edesio, Vanschoren, Joaquin, de Carvalho, Andrรฉ Carlos Ponce de Leon Ferreira
For many machine learning algorithms, predictive performance is critically affected by the hyperparameter values used to train them. However, tuning these hyperparameters can come at a high computational cost, especially on larger datasets, while the tuned settings do not always significantly outperform the default values. This paper proposes a recommender system based on meta-learning to identify exactly when it is better to use default values and when to tune hyperparameters for each new dataset. Besides, an in-depth analysis is performed to understand what they take into account for their decisions, providing useful insights. An extensive analysis of different categories of meta-features, meta-learners, and setups across 156 datasets is performed. Results show that it is possible to accurately predict when tuning will significantly improve the performance of the induced models. The proposed system reduces the time spent on optimization processes, without reducing the predictive performance of the induced models (when compared with the ones obtained using tuned hyperparameters). We also explain the decision-making process of the meta-learners in terms of linear separability-based hypotheses. Although this analysis is focused on the tuning of Support Vector Machines, it can also be applied to other algorithms, as shown in experiments performed with decision trees.
On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods
Martinez, Waldyn, Gray, J. Brian
Boosting and other ensemble methods combine a large number of weak classifiers through weighted voting to produce stronger predictive models. To explain the successful performance of boosting algorithms, Schapire et al. (1998) showed that AdaBoost is especially effective at increasing the margins of the training data. Schapire et al. (1998) also developed an upper bound on the generalization error of any ensemble based on the margins of the training data, from which it was concluded that larger margins should lead to lower generalization error, everything else being equal (sometimes referred to as the ``large margins theory''). Tighter bounds have been derived and have reinforced the large margins theory hypothesis. For instance, Wang et al. (2011) suggest that specific margin instances, such as the equilibrium margin, can better summarize the margins distribution. These results have led many researchers to consider direct optimization of the margins to improve ensemble generalization error with mixed results. We show that the large margins theory is not sufficient for explaining the performance of voting classifiers. We do this by illustrating how it is possible to improve upon the margin distribution of an ensemble solution, while keeping the complexity fixed, yet not improve the test set performance.
DataLearner: A Data Mining and Knowledge Discovery Tool for Android Smartphones and Tablets
Yates, Darren, Islam, Md Zahidul, Gao, Junbin
Smartphones have become the ultimate'personal' computer, yet despite this, general-purpose data mining and knowledge discovery tools for mobile devices are surprisingly rare. DataLearner is a new data mining application designed specifically for Android devices that imports the Weka data mining engine and augments it with algorithms developed by Charles Sturt University. Moreover, DataLearner can be expanded with additional algorithms. Combined, DataLearner delivers 40 classification, clustering and association rule mining algorithms for model training and evaluation without need for cloud computing resources or network connectivity. It provides the same classification accuracy as PCs and laptops, while doing so with acceptable processing speed and consuming negligible battery life. With its ability to provide easy-to-use data mining on a phone-size screen, DataLearner is a new portable, self-contained data mining tool for remote, personalised and educational applications alike. DataLearner features four elements - this paper, the app available on Google Play, the GPL3-licensed source code on GitHub and a short video on YouTube.
Random Forest vs Neural Network: Which is Better, and When?
Which is better: Random Forest or Neural Network? This is a common question, with a very easy answer: it depends:). I will try to show you when it is good to use Random Forest and when to use Neural Network. First of all, Random Forest (RF) and Neural Network (NN) are different types of algorithms. The RF is the ensemble of decision trees.
Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks
Andriushchenko, Maksym, Hein, Matthias
The problem of adversarial samples has been studied extensively for neural networks. However, for boosting, in particular boosted decision trees and decision stumps there are almost no results, even though boosted decision trees, as e.g. XGBoost, are quite popular due to their interpretability and good prediction performance. We show in this paper that for boosted decision stumps the exact min-max optimal robust loss and test error for an $l_\infty$-attack can be computed in $O(n\,T\log T)$, where $T$ is the number of decision stumps and $n$ the number of data points, as well as an optimal update of the ensemble in $O(n^2\,T\log T)$. While not exact, we show how to optimize an upper bound on the robust loss for boosted trees. Up to our knowledge, these are the first algorithms directly optimizing provable robustness guarantees in the area of boosting. We make the code of all our experiments publicly available at https://github.com/max-andr/provably-robust-boosting