Goto

Collaborating Authors

 Decision Tree Learning


Cost-Sensitive Decision Trees for Imbalanced Classification

#artificialintelligence

The decision tree algorithm is effective for balanced classification, although it does not perform well on imbalanced datasets. The split points of the tree are chosen to best separate examples into two groups with minimum mixing. When both groups are dominated by examples from one class, the criterion used to select a split point will see good separation, when in fact, the examples from the minority class are being ignored. This problem can be overcome by modifying the criterion used to evaluate split points to take the importance of each class into account, referred to generally as the weighted split-point or weighted decision tree. In this tutorial, you will discover the weighted decision tree for imbalanced classification.


Interpretable Companions for Black-Box Models

arXiv.org Machine Learning

We present an interpretable companion model for any pre-trained black-box classifiers. The idea is that for any input, a user can decide to either receive a prediction from the black-box model, with high accuracy but no explanations, or employ a companion rule to obtain an interpretable prediction with slightly lower accuracy. The companion model is trained from data and the predictions of the black-box model, with the objective combining area under the transparency--accuracy curve and model complexity. Our model provides flexible choices for practitioners who face the dilemma of choosing between always using interpretable models and always using black-box models for a predictive task, so users can, for any given input, take a step back to resort to an interpretable prediction if they find the predictive performance satisfying, or stick to the black-box model if the rules are unsatisfying. To show the value of companion models, we design a human evaluation on more than a hundred people to investigate the tolerable accuracy loss to gain interpretability for humans.


Ensemble Methods for Decision Trees

#artificialintelligence

Decision Trees are popular Machine Learning algorithms used for both regression and classification tasks. Their popularity mainly arises from their interpretability and representability, as they mimic the way the human brain takes decisions. However, to be interpretable, they pay a price in terms of prediction accuracy. To overcome this caveat, some techniques have been developed, with the goal of creating strong and robust models starting from'poor' models. Those techniques are known as'ensemble' methods and, in this article, I'm going to talk about three of them: Bagging, Random Forest and Boosting.


Stochastic tree ensembles for regularized nonlinear regression

arXiv.org Machine Learning

Tree-based algorithms for supervised learning, such as Classification and Regression Trees (CART) (Breiman et al., 1984), random forests (Breiman, 1996, 2001), adaBoost (Freund and Schapire, 1997), and gradient boosting (Breiman, 1997; Friedman, 2001, 2002), are widely used for applied supervised learning. As a whole, these methods are popular in applied settings due to their speed and accuracy in mean estimation and out-of-sample prediction tasks. One limitation of such methods is their well-known sensitivity to tuning parameters, which require costly cross-validation to optimize. Bayesian additive regression trees (BART) (Chipman et al., 2007, 2010) is a popular model-based alternative that is often more accurate than other treebased methods; specifically, BART boasts valuable robustness to the choice of tuning-parameters. However, relative to random forests and boosting, BART's wider adoption has been slowed by its more severe computational demands, owing to its reliance on a random walk Metropolis-Hastings Markov chain Monte Carlo (MCMC) algorithm. Despite this limitation, BART has inspired a considerable body of research in recent years.


Machine Learning Tutorial Part 4 Machine Learning For Beginners - Python Decision Tree

#artificialintelligence

Sign in to report inappropriate content. Learn how to implement a decision tree using the Python programming language. You will learn how to train and display you implementation of the decision tree classifier.


The Math of Decision Trees, Random Forest and Feature Importance in Scikit-learn and Spark

#artificialintelligence

This post attempts to consolidate information on tree algorithms and their implementations in Scikit-learn and Spark. In particular, it was written to provide clarification on how feature importance is calculated.


Privacy-Preserving Boosting in the Local Setting

arXiv.org Machine Learning

In machine learning, boosting is one of the most popular methods that designed to combine multiple base learners to a superior one. The well-known Boosted Decision Tree classifier, has been widely adopted in many areas. In the big data era, the data held by individual and entities, like personal images, browsing history and census information, are more likely to contain sensitive information. The privacy concern raises when such data leaves the hand of the owners and be further explored or mined. Such privacy issue demands that the machine learning algorithm should be privacy aware. Recently, Local Differential Privacy is proposed as an effective privacy protection approach, which offers a strong guarantee to the data owners, as the data is perturbed before any further usage, and the true values never leave the hands of the owners. Thus the machine learning algorithm with the private data instances is of great value and importance. In this paper, we are interested in developing the privacy-preserving boosting algorithm that a data user is allowed to build a classifier without knowing or deriving the exact value of each data samples. Our experiments demonstrate the effectiveness of the proposed boosting algorithm and the high utility of the learned classifiers.


Robust Boosting for Regression Problems

arXiv.org Machine Learning

The gradient boosting algorithm constructs a regression estimator using a linear combination of simple "base learners". In order to obtain a robust non-parametric regression estimator that is scalable to high dimensional problems we propose a robust boosting algorithm based on a two-stage approach, similar to what is done for robust linear regression: we first minimize a robust residual scale estimator, and then improve its efficiency by optimizing a bounded loss function. Unlike previous proposals, our algorithm does not need to compute an ad-hoc residual scale estimator in each step. Since our loss functions are typically non-convex, we propose initializing our algorithm with an $L_1$ regression tree, which is fast to compute. We also introduce a robust variable importance metric for variable selection that is calculated via a permutation procedure. Through simulated and real data experiments, we compare our method against gradient boosting with squared loss and other robust boosting methods in the literature. With clean data, our method works equally well as gradient boosting with the squared loss. With symmetric and asymmetrically contaminated data, we show that our proposed method outperforms in terms of prediction error and variable selection accuracy.


A Survey on Causal Inference

arXiv.org Artificial Intelligence

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.


Profit-oriented sales forecasting: a comparison of forecasting techniques from a business perspective

arXiv.org Machine Learning

Choosing the technique that is the best at forecasting your data, is a problem that arises in any forecasting application. Decades of research have resulted into an enormous amount of forecasting methods that stem from statistics, econometrics and machine learning (ML), which leads to a very difficult and elaborate choice to make in any forecasting exercise. This paper aims to facilitate this process for high-level tactical sales forecasts by comparing a large array of techniques for 35 times series that consist of both industry data from the Coca-Cola Company and publicly available datasets. However, instead of solely focusing on the accuracy of the resulting forecasts, this paper introduces a novel and completely automated profit-driven approach that takes into account the expected profit that a technique can create during both the model building and evaluation process. The expected profit function that is used for this purpose, is easy to understand and adaptable to any situation by combining forecasting accuracy with business expertise. Furthermore, we examine the added value of ML techniques, the inclusion of external factors and the use of seasonal models in order to ascertain which type of model works best in tactical sales forecasting. Our findings show that simple seasonal time series models consistently outperform other methodologies and that the profit-driven approach can lead to selecting a different forecasting model.