Goto

Collaborating Authors

 Decision Tree Learning


Optimally Pruning Decision Tree Ensembles With Feature Cost

arXiv.org Machine Learning

We consider the problem of learning decision rules for prediction with feature budget constraint. In particular, we are interested in pruning an ensemble of decision trees to reduce expected feature cost while maintaining high prediction accuracy for any test example. We propose a novel 0-1 integer program formulation for ensemble pruning. Our pruning formulation is general - it takes any ensemble of decision trees as input. By explicitly accounting for feature-sharing across trees together with accuracy/cost trade-off, our method is able to significantly reduce feature cost by pruning subtrees that introduce more loss in terms of feature cost than benefit in terms of prediction accuracy gain. Theoretically, we prove that a linear programming relaxation produces the exact solution of the original integer program. This allows us to use efficient convex optimization tools to obtain an optimally pruned ensemble for any given budget. Empirically, we see that our pruning algorithm significantly improves the performance of the state of the art ensemble method BudgetRF.


Efficient Non-greedy Optimization of Decision Trees

Neural Information Processing Systems

Decision trees and randomized forests are widely used in computer vision and machine learning. Standard algorithms for decision tree induction optimize the split functions one node at a time according to some splitting criteria. This greedy procedure often leads to suboptimal trees. In this paper, we present an algorithm for optimizing the split functions at all levels of the tree jointly with the leaf parameters, based on a global objective. We show that the problem of finding optimal linear-combination (oblique) splits for decision trees is related to structured prediction with latent variables, and we formulate a convex-concave upper bound on the tree's empirical loss. Computing the gradient of the proposed surrogate objective with respect to each training exemplar is O(d^2), where d is the tree depth, and thus training deep trees is feasible. The use of stochastic gradient descent for optimization enables effective training with large datasets. Experiments on several classification benchmarks demonstrate that the resulting non-greedy decision trees outperform greedy decision tree baselines.


Logarithmic Time Online Multiclass prediction

Neural Information Processing Systems

We study the problem of multiclass classification with an extremely large number of classes (k), with the goal of obtaining train and test time complexity logarithmic in the number of classes. We develop top-down tree construction approaches for constructing logarithmic depth trees. On the theoretical front, we formulate a new objective function, which is optimized at each node of the tree and creates dynamic partitions of the data which are both pure (in terms of class labels) and balanced. We demonstrate that under favorable conditions, we can construct logarithmic depth trees that have leaves with low label entropy. However, the objective function at the nodes is challenging to optimize computationally. We address the empirical problem with a new online decision tree construction procedure. Experiments demonstrate that this online algorithm quickly achieves improvement in test error compared to more common logarithmic training time approaches, which makes it a plausible method in computationally constrained large-k applications.


Recursive Partitioning for Heterogeneous Causal Effects

arXiv.org Machine Learning

In this paper we study the problems of estimating heterogeneity in causal effects in experimental or observational studies and conducting inference about the magnitude of the differences in treatment effects across subsets of the population. In applications, our method provides a data-driven approach to determine which subpopulations have large or small treatment effects and to test hypotheses about the differences in these effects. For experiments, our method allows researchers to identify heterogeneity in treatment effects that was not specified in a pre-analysis plan, without concern about invalidating inference due to multiple testing. In most of the literature on supervised machine learning (e.g. regression trees, random forests, LASSO, etc.), the goal is to build a model of the relationship between a unit's attributes and an observed outcome. A prominent role in these methods is played by cross-validation which compares predictions to actual outcomes in test samples, in order to select the level of complexity of the model that provides the best predictive power. Our method is closely related, but it differs in that it is tailored for predicting causal effects of a treatment rather than a unit's outcome. The challenge is that the "ground truth" for a causal effect is not observed for any individual unit: we observe the unit with the treatment, or without the treatment, but not both at the same time. Thus, it is not obvious how to use cross-validation to determine whether a causal effect has been accurately predicted. We propose several novel cross-validation criteria for this problem and demonstrate through simulations the conditions under which they perform better than standard methods for the problem of causal effects. We then apply the method to a large-scale field experiment re-ranking results on a search engine.


Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance

arXiv.org Machine Learning

Recursive partitioning approaches producing tree-like models are a long standing staple of predictive modeling, in the last decade mostly as ``sub-learners'' within state of the art ensemble methods like Boosting and Random Forest. However, a fundamental flaw in the partitioning (or splitting) rule of commonly used tree building methods precludes them from treating different types of variables equally. This most clearly manifests in these methods' inability to properly utilize categorical variables with a large number of categories, which are ubiquitous in the new age of big data. Such variables can often be very informative, but current tree methods essentially leave us a choice of either not using them, or exposing our models to severe overfitting. We propose a conceptual framework to splitting using leave-one-out (LOO) cross validation for selecting the splitting variable, then performing a regular split (in our case, following CART's approach) for the selected variable. The most important consequence of our approach is that categorical variables with many categories can be safely used in tree building and are only chosen if they contribute to predictive power. We demonstrate in extensive simulation and real data analysis that our novel splitting approach significantly improves the performance of both single tree models and ensemble methods that utilize trees. Importantly, we design an algorithm for LOO splitting variable selection which under reasonable assumptions does not increase the overall computational complexity compared to CART for two-class classification. For regression tasks, our approach carries an increased computational burden, replacing a O(log(n)) factor in CART splitting rule search with an O(n) term.


A Random Forest Guided Tour

arXiv.org Machine Learning

The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad-hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.


Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

arXiv.org Machine Learning

We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS$_2$ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more accurate.


Predicting Purchase Decisions in Mobile Free-to-Play Games

AAAI Conferences

Mobile digital games are dominantly released under the freemium business model, but only a small fraction of the players makes any purchases. The ability to predict who will make a purchase enables optimization of marketing efforts, and tailoring customer relationship management to the specific user's profile. Here this challenge is addressed via two models for predicting purchasing players, using a 100,000 player dataset: 1) A classification model focused on predicting whether a purchase will occur or not. 2) a regression model focused on predicting the number of purchases a user will make. Both models are presented within a decision and regression tree framework for building rules that are actionable by companies. To the best of our knowledge, this is the first study investigating purchase decisions in freemium mobile products from a user behavior perspective and adopting behavior-driven learning approaches to this problem.


Towards Real-time Customer Experience Prediction for Telecommunication Operators

arXiv.org Machine Learning

Telecommunications operators (telcos) traditional sources of income, voice and SMS, are shrinking due to customers using over-the-top (OTT) applications such as WhatsApp or Viber. In this challenging environment it is critical for telcos to maintain or grow their market share, by providing users with as good an experience as possible on their network. But the task of extracting customer insights from the vast amounts of data collected by telcos is growing in complexity and scale everey day. How can we measure and predict the quality of a user's experience on a telco network in real-time? That is the problem that we address in this paper. We present an approach to capture, in (near) real-time, the mobile customer experience in order to assess which conditions lead the user to place a call to a telco's customer care center. To this end, we follow a supervised learning approach for prediction and train our 'Restricted Random Forest' model using, as a proxy for bad experience, the observed customer transactions in the telco data feed before the user places a call to a customer care center. We evaluate our approach using a rich dataset provided by a major African telecommunication's company and a novel big data architecture for both the training and scoring of predictive models. Our empirical study shows our solution to be effective at predicting user experience by inferring if a customer will place a call based on his current context. These promising results open new possibilities for improved customer service, which will help telcos to reduce churn rates and improve customer experience, both factors that directly impact their revenue growth.


Predicting SLA Violations in Real Time using Online Machine Learning

arXiv.org Machine Learning

Next generation telecom services will execute on the telecom cloud, which combine the flexibility of today's computing clouds with the service quality of telecom systems. Real-time service assurance will become an integral part in transforming the general and flexible cloud into a robust and highly available cloud that can ensure low latency and agreed service quality to its customers. A service assurance system for telecom services must be able to detect and preferably also predict problems that may violate the agreed service quality. This is a complex task already in legacy systems and will become even more challenging when executing the services in the cloud. Further, the service assurance system must be able to remedy, in real time, these problems once detected. One promising approach to service assurance is based on machine learning, where the service quality and behavior is learned from observations of the system. The ambition is to do automated real-time predictions of the service quality in order to execute mitigation actions in a proactive manner. Machine learning has been used in the past to build prediction models for service quality assurance.