Decision Tree Learning
Using Decision Trees to Identify White Nationalists
Simple yet effective, they are easily visualized, intuitively understood, and a great place to start when trying to understand what this artificial intelligence stuff is all about. Right now, out in the real-world, decision trees are being used to predict which customers will default on a loan, which credit card transactions are fraudulent, and which stocks are a good buy this week. This technology is already embedded all around us. Smart corporations have already been using this stuff for years, and now government is getting in on the action. As these systems become more sophisticated, and more embedded in every aspect of our daily lives, being an informed citizen means having at least a basic understanding of this stuff.
ropenscilabs/proxy-bias-vignette
Machine Learning systems often inherit biases against protected classes and historically disparaged groups via their training data [1]. Though some biases in features are straightforward to detect (ex: age, gender, race), others are not explicit and rely on subtle correlations in machine learning algorithms to understand. The incorporation of unintended bias into predictive models is called proxy discrimination. In this vignette, we will be implementing an example machine learning model using decision trees, and determining whether its classification for loan recipients is biased against certain groups. We will explore several ways of detecting unintentional bias and removing it from our predictive model.
Super learning in the SAS system
Background and objective: Stacking is an ensemble machine learning method that averages predictions from multiple other algorithms, such as generalized linear models and regression trees. A recent iteration of stacking, called super learning, has been developed as a general approach to black box supervised learning and has seen frequent usage, in part due to the availability of an R package. I develop super learning in the SAS software system using a new macro, and demonstrate its performance relative to the R package. Methods: I follow closely previous work using the R SuperLearner package and assess the performance of super learning in a number of domains. I compare the R package with the new SAS macro in a small set of simulations assessing curve fitting in a prediction model, a set of 14 publicly available datasets to assess cross-validated, expected loss, and data from a randomized trial of job seekers' training to assess the utility of super learning in causal inference using inverse probability weighting. Results: Across the simulated data and the publicly available data, the macro performed similarly to the R package, even with a different set of potential algorithms available natively in R and SAS. The example with inverse probability weighting demonstrated the ability of the SAS macro to include algorithms developed in R. Conclusions: The super learner macro performs as well as the R package at a number of tasks. Further, by extending the macro to include the use of R packages, the macro can leverage both the robust, enterprise oriented procedures in SAS and the nimble, cutting edge packages in R. In the spirit of ensemble learning, this macro extends the potential library of algorithms beyond a single software system and provides a simple avenue into machine learning in SAS.
Confounding-Robust Policy Improvement
We study the problem of learning personalized decision policies from observational data while accounting for possible unobserved confounding in the data-generating process. Unlike previous approaches which assume unconfoundedness, i.e., no unobserved confounders affected treatment assignment as well as outcome, we calibrate policy learning for realistic violations of this unverifiable assumption with uncertainty sets motivated by sensitivity analysis in causal inference. Our framework for confounding-robust policy improvement optimizes the minimax regret of a candidate policy against a baseline or reference "status quo" policy, over a uncertainty set around nominal propensity weights. We prove that if the uncertainty set is well-specified, robust policy learning can do no worse than the baseline, and only improve if the data supports it. We characterize the adversarial subproblem and use efficient algorithmic solutions to optimize over parametrized spaces of decision policies such as logistic treatment assignment. We assess our methods on synthetic data and a large clinical trial, demonstrating that confounded selection can hinder policy learning and lead to unwarranted harm, while our robust approach guarantees safety and focuses on well-evidenced improvement.
Verifiable Reinforcement Learning via Policy Extraction
Bastani, Osbert, Pu, Yewen, Solar-Lezama, Armando
While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy.
Introduction to Machine Learning for non-developers
There are several types of predictive models. These models usually have several input columns and one target or outcome column, which is the variable to be predicted. So basically, a model performs mapping between inputs and an output, finding-mysteriously, sometimes-the relationships between the input variables in order to predict any other variable. As you may notice, it has some commonalities with a human being who reads the environment processes the information and performs a certain action. It's about becoming familiar with one of the most-used predictive models: Random Forest (official algorithm site), implemented in R, one of the most-used models due to its simplicity in tuning and robustness across many different types of data.
Adaptively Pruning Features for Boosted Decision Trees
Aziz, Maryam, Anderton, Jesse, Aslam, Javed
Boosted decision trees enjoy popularity in a variety of applications; however, for large-scale datasets, the cost of training a decision tree in each round can be prohibitively expensive. Inspired by ideas from the multi-arm bandit literature, we develop a highly efficient algorithm for computing exact greedy-optimal decision trees, outperforming the state-of-the-art Quick Boost method. We further develop a framework for deriving lower bounds on the problem that applies to a wide family of conceivable algorithms for the task (including our algorithm and Quick Boost), and we demonstrate empirically on a wide variety of data sets that our algorithm is near-optimal within this family of algorithms. We also derive a lower bound applicable to any algorithm solving the task, and we demonstrate that our algorithm empirically achieves performance close to this best-achievable lower bound.
Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining
da Costa, Victor Guilherme Turrisi, de Carvalho, André Carlos Ponce de Leon Ferreira, Junior, Sylvio Barbon
Dealing with memory and time constraints are current challenges when learning from data streams with a massive amount of data. Many algorithms have been proposed to handle these difficulties, among them, the Very Fast Decision Tree (VFDT) algorithm. Although the VFDT has been widely used in data stream mining, in the last years, several authors have suggested modifications to increase its performance, putting aside memory concerns by proposing memory-costly solutions. Besides, most data stream mining solutions have been centred around ensembles, which combine the memory costs of their weak learners, usually VFDTs. To reduce the memory cost, keeping the predictive performance, this study proposes the Strict VFDT (SVFDT), a novel algorithm based on the VFDT. The SVFDT algorithm minimises unnecessary tree growth, substantially reducing memory usage and keeping competitive predictive performance. Moreover, since it creates much more shallow trees than VFDT, SVFDT can achieve a shorter processing time. Experiments were carried out comparing the SVFDT with the VFDT in 11 benchmark data stream datasets. This comparison assessed the trade-off between accuracy, memory, and processing time. Statistical analysis showed that the proposed algorithm obtained similar predictive performance and significantly reduced processing time and memory use. Thus, SVFDT is a suitable option for data stream mining with memory and time limitations, recommended as a weak learner in ensemble-based solutions.
Prediction Rule Reshaping
Bonakdarpour, Matt, Chatterjee, Sabyasachi, Barber, Rina Foygel, Lafferty, John
Two methods are proposed for high-dimensional shape-constrained regression and classification. These methods reshape pre-trained prediction rules to satisfy shape constraints like monotonicity and convexity. The first method can be applied to any pre-trained prediction rule, while the second method deals specifically with random forests. In both cases, efficient algorithms are developed for computing the estimators, and experiments are performed to demonstrate their performance on four datasets. We find that reshaping methods enforce shape constraints without compromising predictive accuracy.
A One-Class Decision Tree Based on Kernel Density Estimation
Itani, Sarah, Lecron, Fabian, Fortemps, Philippe
Many data science issues have to be addressed through unbalanced datasets. Indeed, it may be quite affordable to gather data on the representatives of a given pathology in medicine, or positive operating scenarios of machines in the industry [1]. The related complementary occurrences are, by contrast, scarce and/or expensive to raise. The practice of One-Class Classification (OCC) has been developed within this consideration [1, 2]. One-class classifiers are trained on a single class sample, in the possible presence of a few counterexamples. The related issue consists of understanding and isolating a given class from the rest of the universe. The resulting model allows to predict target (or positive) patterns and to reject outlier (or negative) ones. One-Class Support Vector Machine (OCSVM) is a popular OCC method [3, 4]. Statistics-based techniques such as Gaussian models and Kernel Density Estimation (KDE) [5] are also commonly considered as respectively parametric and nonparametric approaches to estimate a sample distribution.