Decision Tree Learning
Decision Forest: A Nonparametric Approach to Modeling Irrational Choice
Chen, Yi-Chun, Mišić, Velibor V.
Customer behavior is often assumed to follow weak rationality, which implies that adding a product to an assortment will not increase the choice probability of another product in that assortment. However, an increasing amount of research has revealed that customers are not necessarily rational when making decisions. In this paper, we study a new nonparametric choice model that relaxes this assumption and can model a wider range of customer behavior, such as decoy effects between products. In this model, each customer type is associated with a binary decision tree, which represents a decision process for making a purchase based on checking for the existence of specific products in the assortment. Together with a probability distribution over customer types, we show that the resulting model -- a decision forest -- is able to represent any customer choice model, including models that are inconsistent with weak rationality. We theoretically characterize the depth of the forest needed to fit a data set of historical assortments and prove that asymptotically, a forest whose depth scales logarithmically in the number of assortments is sufficient to fit most data sets. We also propose an efficient algorithm for estimating such models from data, based on combining randomization and optimization. Using synthetic data and real transaction data exhibiting non-rational behavior, we show that the model outperforms the multinomial logit and ranking-based models in out-of-sample predictive ability.
Regression-Enhanced Random Forests
Zhang, Haozhe, Nettleton, Dan, Zhu, Zhengyuan
In the last few years, there have been many methodological and theoretical advances in the random forests approach. Some methodological developments and extensions include case-specific random forests [19], multivariate random forests [16], quantile regression forests [13], random survival forests [11], enriched random forests for microarry data [1] and predictor augmentation in random forests [18] among others. For theoretical developments, the statistical and asymptotic properties of random forests have been intensively investigated. Advances have been made in the areas such as consistency [2] [15], variable selection [8] and the construction of confidence intervals [17]. Although RF methodology has proven itself to be a reliable predictive approach in many application areas [3][10], there are some cases where random forests may suffer. First, as a fully nonparametric predictive algorithm, random forests may not efficiently incorporate known relationships between the response and the predictors. Second, random forests may fail in extrapolation problems where predictions are required at points out of the domain of the training dataset. For regression problems, a random forest prediction is an average of the predictions produced by the trees in the forest.
Random Pairwise Shapelets Forest
Shi, Mohan, Wang, Zhihai, Yuan, Jodong, Liu, Haiyang
Shapelet is a discriminative subsequence of time series. An advanced shapelet-based method is to embed shapelet into accurate and fast random forest. However, it shows several limitations. First, random shapelet forest requires a large training cost for split threshold searching. Second, a single shapelet provides limited information for only one branch of the decision tree, resulting in insufficient accuracy and interpretability. Third, randomized ensemble causes interpretability declining. For that, this paper presents Random Pairwise Shapelets Forest (RPSF). RPSF combines a pair of shapelets from different classes to construct random forest. It omits threshold searching to be more efficient, includes more information for each node of the forest to be more effective. Moreover, a discriminability metric, Decomposed Mean Decrease Impurity (DMDI), is proposed to identify influential region for every class. Extensive experiments show RPSF improves the accuracy and training speed of shapelet-based forest. Case studies demonstrate the interpretability of our method.
Integrating Association Rules with Decision Trees in Object-Relational Databases
Research has provided evidence that associative classification produces more accurate results compared to other classification models. The Classification Based on Association (CBA) is one of the famous Associative Classification algorithms that generates accurate classifiers. However, current association classification algorithms reside external to databases, which reduces the flexibility of enterprise analytics systems. This paper implements the CBA in Oracle database using two variant models: hardcoding the CBA in Oracle Data Mining (ODM) package and Integrating Oracle Apriori model with the Oracle Decision tree model. We compared the proposed model performance with Naive Bayes, Support Vector Machine, Random Forests, and Decision Tree over 18 datasets from UCI. Results showed that our models outperformed the original CBA model with 1 percent and is competitive to chosen classification models over benchmark datasets.
Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models
Mohammadi, Reza, Pratola, Matthew, Kaptein, Maurits
Decision trees are flexible models that are well suited for many statistical regression problems. In a Bayesian framework for regression trees, Markov Chain Monte Carlo (MCMC) search algorithms are required to generate samples of tree models according to their posterior probabilities. The critical component of such an MCMC algorithm is to construct good Metropolis-Hastings steps for updating the tree topology. However, such algorithms frequently suffering from local mode stickiness and poor mixing. As a result, the algorithms are slow to converge. Hitherto, authors have primarily used discrete-time birth/death mechanisms for Bayesian (sums of) regression tree models to explore the model space. These algorithms are efficient only if the acceptance rate is high which is not always the case. Here we overcome this issue by developing a new search algorithm which is based on a continuous-time birth-death Markov process. This search algorithm explores the model space by jumping between parameter spaces corresponding to different tree structures. In the proposed algorithm, the moves between models are always accepted which can dramatically improve the convergence and mixing properties of the MCMC algorithm. We provide theoretical support of the algorithm for Bayesian regression tree models and demonstrate its performance.
Visualizing the decision-making process in deep neural decision forest
Li, Shichao, Cheng, Kwang-Ting
Deep neural decision forest (NDF) achieved remarkable performance on various vision tasks via combining decision tree and deep representation learning. In this work, we first trace the decision-making process of this model and visualize saliency maps to understand which portion of the input influence it more for both classification and regression problems. We then apply NDF on a multi-task coordinate regression problem and demonstrate the distribution of routing probabilities, which is vital for interpreting NDF yet not shown for regression problems. The pre-trained model and code for visualization will be available at https://github.com/Nicholasli1995/VisualizingNDF
Scalable and Efficient Hypothesis Testing with Random Forests
Coleman, Tim, Peng, Wei, Mentch, Lucas
Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods. While their black-box nature has made their mathematical analysis difficult, recent work has established important statistical properties like consistency and asymptotic normality by considering subsampling in lieu of bootstrapping. Though such results open the door to traditional inference procedures, all formal methods suggested thus far place severe restrictions on the testing framework and their computational overhead precludes their practical scientific use. Here we propose a permutation-style testing approach to formally assess feature significance. We establish asymptotic validity of the test via exchangeability arguments and show that the test maintains high power with orders of magnitude fewer computations. As importantly, the procedure scales easily to big data settings where large training and testing sets may be employed without the need to construct additional models. Simulations and applications to ecological data where random forests have recently shown promise are provided.
Learning Optimal Decision Trees from Large Datasets
Inferring a decision tree from a given dataset is one of the classic problems in machine learning. This problem consists of buildings, from a labelled dataset, a tree such that each node corresponds to a class and a path between the tree root and a leaf corresponds to a conjunction of features to be satisfied in this class. Following the principle of parsimony, we want to infer a minimal tree consistent with the dataset. Unfortunately, inferring an optimal decision tree is known to be NP-complete for several definitions of optimality. Hence, the majority of existing approaches relies on heuristics, and as for the few exact inference approaches, they do not work on large data sets. In this paper, we propose a novel approach for inferring a decision tree of a minimum depth based on the incremental generation of Boolean formula. The experimental results indicate that it scales sufficiently well and the time it takes to run grows slowly with the size of dataset.
Quantifying Interpretability of Arbitrary Machine Learning Models Through Functional Decomposition
Molnar, Christoph, Casalicchio, Giuseppe, Bischl, Bernd
To obtain interpretable machine learning models, either interpretable models are constructed from the outset - e.g. shallow decision trees, rule lists, or sparse generalized linear models - or post-hoc interpretation methods - e.g. partial dependence or ALE plots - are employed. Both approaches have disadvantages. While the former can restrict the hypothesis space too conservatively, leading to potentially suboptimal solutions, the latter can produce too verbose or misleading results if the resulting model is too complex, especially w.r.t. feature interactions. We propose to make the compromise between predictive power and interpretability explicit by quantifying the complexity / interpretability of machine learning models. Based on functional decomposition, we propose measures of number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures becomes more reliable and compact. Furthermore, we demonstrate the application of such measures in a multi-objective optimization approach which considers predictive power and interpretability at the same time.
Unveiling phase transitions with machine learning
Canabarro, Askery, Fanchini, Felipe Fernandes, Malvezzi, André Luiz, Pereira, Rodrigo, Chaves, Rafael
The classification of phase transitions is a central and challenging task in condensed matter physics. Typically, it relies on the identification of order parameters and the analysis of singularities in the free energy and its derivatives. Here, we propose an alternative framework to identify quantum phase transitions, employing both unsupervised and supervised machine learning techniques. Using the axial next-nearest neighbor Ising (ANNNI) model as a benchmark, we show how unsupervised learning can detect three phases (ferromagnetic, paramagnetic, and a cluster of the antiphase with the floating phase) as well as two distinct regions within the paramagnetic phase. Employing supervised learning we show that transfer learning becomes possible: a machine trained only with nearest-neighbour interactions can learn to identify a new type of phase occurring when next-nearest-neighbour interactions are introduced. All our results rely on few and low dimensional input data (up to twelve lattice sites), thus providing a computational friendly and general framework for the study of phase transitions in many-body systems.