Decision Tree Learning
Gradient Regularized Budgeted Boosting
Xu, Zhixiang Eddie, Kusner, Matt J., Weinberger, Kilian Q., Zheng, Alice X.
As machine learning transitions increasingly towards real world applications controlling the test-time cost of algorithms becomes more and more crucial. Recent work, such as the Greedy Miser and Speedboost, incorporate test-time budget constraints into the training procedure and learn classifiers that provably stay within budget (in expectation). However, so far, these algorithms are limited to the supervised learning scenario where sufficient amounts of labeled data are available. In this paper we investigate the common scenario where labeled data is scarce but unlabeled data is available in abundance. We propose an algorithm that leverages the unlabeled data (through Laplace smoothing) and learns classifiers with budget constraints. Our model, based on gradient boosted regression trees (GBRT), is, to our knowledge, the first algorithm for semi-supervised budgeted learning.
Learning Interpretable Models with Causal Guarantees
Machine learning has shown much promise in helping improve the quality of medical, legal, and economic decision-making. In these applications, machine learning models must satisfy two important criteria: (i) they must be causal, since the goal is typically to predict individual treatment effects, and (ii) they must be interpretable, so that human decision makers can validate and trust the model predictions. There has recently been much progress along each direction independently, yet the state-of-the-art approaches are fundamentally incompatible. We propose a framework for learning causal interpretable models---from observational data---that can be used to predict individual treatment effects. Our framework can be used with any algorithm for learning interpretable models. Furthermore, we prove an error bound on the treatment effects predicted by our model. Finally, in an experiment on real-world data, we show that the models trained using our framework significantly outperform a number of baselines.
Stochastic Gradient Trees
Gouk, Henry, Pfahringer, Bernhard, Frank, Eibe
We present an online algorithm that induces decision trees using gradient information as the source of supervision. In contrast to previous approaches to gradient-based tree learning, we do not require soft splits or construction of a new tree for every update. In experiments, our method performs comparably to standard incremental classification trees and outperforms state of the art incremental regression trees. We also show how the method can be used to construct a novel type of neural network layer suited to learning representations from tabular data and find that it increases accuracy of multiclass and multi-label classification.
Adaptive Exact Learning of Decision Trees from Membership Queries
Bshouty, Nader H., Haddad-Zaknoon, Catherine A.
In this paper we study the adaptive learnability of decision trees of depth at most $d$ from membership queries. This has many applications in automated scientific discovery such as drugs development and software update problem. Feldman solves the problem in a randomized polynomial time algorithm that asks $\tilde O(2^{2d})\log n$ queries and Kushilevitz-Mansour in a deterministic polynomial time algorithm that asks $ 2^{18d+o(d)}\log n$ queries. We improve the query complexity of both algorithms. We give a randomized polynomial time algorithm that asks $\tilde O(2^{2d}) + 2^{d}\log n$ queries and a deterministic polynomial time algorithm that asks $2^{5.83d}+2^{2d+o(d)}\log n$ queries.
AdaBoost, Clearly Explained
AdaBoost is one of those machine learning methods that seems so much more confusing than it really is. NOTE: This video assumes you already know about Decision Trees... https://youtu.be/7VeUPuFGJHk Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer
Entropy: How Decision Trees Make Decisions – Towards Data Science
You've come a long way from writing your first line of Python or R code. You know your way around Scikit-Learn like the back of your hand. You spend more time on Kaggle than Facebook now. You're no stranger to building awesome random forests and other tree based ensemble models that get the job done. You want to dig deeper and understand some of the intricacies and concepts behind popular machine learning models.
Decision Trees -- An Intuitive Introduction – x8 -- The AI Community – Medium
Imagine you are out to buy a cell phone for yourself. Shopkeeper asks,"How can I help you Ma'am?" "I am looking for a cell phone" "You are at the right place, we have over 300 different types of cell phones, what kind of phone would you like to buy today?" Decision paralysis hits you, totally confused among so many choices of phones you go blank! "Let me help you choose a phone ma'am. What screen size would you like?" "Umm… larger than 5.9 inches" "Perfect, and how about the camera?"
Introduction to machine learning with Weka - Target Veb
In this tutorial a small introduction of machine learning focused on development will be done with one of the most used Java libraries for this purpose, Weka. The machine learning is a subfield of data science . If data science covers the entire process of obtaining knowledge, cleaning, analysis, visualization and data deployment, machine learning are the algorithms and techniques used in the analysis and modeling phase of this process. Within these, we will focus on supervised learning, which is often used for classification and regression problems. The classification can be applied when dealing with a discrete class, where the objective is to predict one of the mutually exclusive values in the target variable.
Predicting wind pressures around circular cylinders using machine learning techniques
Numerous studies have been carried out to measure wind pressures around circular cylinders since the early 20th century due to its engineering significance. Consequently, a large amount of wind pressure data sets have accumulated, which presents an excellent opportunity for using machine learning (ML) techniques to train models to predict wind pressures around circular cylinders. Wind pressures around smooth circular cylinders are a function of mainly the Reynolds number (Re), turbulence intensity (Ti) of the incident wind, and circumferential angle of the cylinder. Considering these three parameters as the inputs, this study trained two ML models to predict mean and fluctuating pressures respectively. Three machine learning algorithms including decision tree regressor, random forest, and gradient boosting regression trees (GBRT) were tested. The GBRT models exhibited the best performance for predicting both mean and fluctuating pressures, and they are capable of making accurate predictions for Re ranging from 10^4 to 10^6 and Ti ranging from 0% to 15%. It is believed that the GBRT models provide very efficient and economical alternative to traditional wind tunnel tests and computational fluid dynamic simulations for determining wind pressures around smooth circular cylinders within the studied Re and Ti range.
Can You Always Bet Big On Machine Learning? - Analytics India Magazine
Machine learning sure is an umbrella word for many methodologies and tools but one must be clear about the fact that it is not an umbrella word for all the solutions. No one can deny that machine learning has revolutionised the way data can be squeezed in for discoveries. What one should care about is that the enhancement of any technology also depends on a relentless introspective approach in attacking the shortcomings. The rise in popularity sure lures every amateur into believing that they have reached their destination. With tools and frameworks being open-sourced, everyone can play with data, experiment with MNIST datasets and get really good accuracy scores.