Decision Tree Learning
Enhanced version of AdaBoostM1 with J48 Tree learning method
Kang, Kyongche, Michalak, Jack
Machine Learning focuses on the construction and study of systems that can learn from data. This is connected with the classification problem, which usually is what Machine Learning algorithms are designed to solve. When a machine learning method is used by people with no special expertise in machine learning, it is important that the method be'robust' in classification, in the sense that reasonable performance is obtained with minimal tuning of the problem at hand. Algorithms are evaluated based on how'robust' they can classify the given data. In this paper, we propose a quantifiable measure of'robustness', and describe a particular learning method that is robust according to this measure in the context of classification problem. We proposed Adaptive Boosting (AdaBoostM1) with J48(C4.5 tree) as a base learner with tuning weight threshold (P) and number of iterations (I) for boosting algorithm. To benchmark the performance, we used the baseline classifier, AdaBoostM1 with Decision Stump as base learner without tuning parameters. By tuning parameters and using J48 as base learner, we are able to reduce the overall average error rate ratio (errorC/errorNB) from 2.4 to 0.9 for development sets of data and 2.1 to 1.2 for evaluation sets of data.
Predicting University Students' Academic Success and Choice of Major using Random Forests
Beaulac, Cรฉdric, Rosenthal, Jeffrey S.
Predicting University Students' Academic Success and Choice of Major using Random Forests C edric Beaulac Jeffrey S. Rosenthal August 31,2017 Abstract In this paper, a large data set containing every course taken by every undergraduate student in a major university in Canada over 10 years is analyzed. Modern machine learning algorithms can use large data sets to build useful tools for the data provider, in this case, the university. In this article, two classifiers are constructed using random forests. To begin, the first two semesters of courses completed by a student are used to predict if they will obtain an undergraduate degree. Secondly, for the students that completed a program, their major choice is predicted using once again the first few courses they've registered to. A classification tree is an intuitive and powerful classifier and building a random forest of trees lowers the variance of the classifier and also prevents overfitting. Random forests also allow for reliable variable importance measurements. These measures explain what variables are useful to both of the classifiers and can be used to better understand what is statistically related to the students' choices. The results are two accurate classifiers and a variable importance analysis that provides useful information to the university. Keywords: Higher Education, Students' Success and Choice, Machine Learning, Classification Tree, Random Forest, Variable Importance 1 Introduction As the demand for qualified labour increases it becomes more and more important to understand what motivates students to complete their program and how they select their majors. In parallel, universities are continuously trying to improve their programs and attract more students. It would be useful for a university to be able to predict whether or not a student that begins a program will complete it.
Using Discretization for Extending the Set of Predictive Features
Rosenfeld, Avi, Illuz, Ron, Gottesman, Dovid, Last, Mark
To date, attribute discretization is typically performed by replacing the original set of continuous features with a transposed set of discrete ones. This paper provides support for a new idea that discretized features should often be used in addition to existing features and as such, datasets should be extended, and not replaced, by discretization. We also claim that discretization algorithms should be developed with the explicit purpose of enriching a non-discretized dataset with discretized values. We present such an algorithm, D-MIAT, a supervised algorithm that discretizes data based on Minority Interesting Attribute Thresholds. D-MIAT only generates new features when strong indications exist for one of the target values needing to be learned and thus is intended to be used in addition to the original data. We present extensive empirical results demonstrating the success of using D-MIAT on $ 28 $ benchmark datasets. We also demonstrate that $ 10 $ other discretization algorithms can also be used to generate features that yield improved performance when used in combination with the original non-discretized data. Our results show that the best predictive performance is attained using a combination of the original dataset with added features from a "standard" supervised discretization algorithm and D-MIAT.
Decision Trees in Machine Learning, Simplified
I did a series of blog posts on different machine learning techniques recently, which sparked a lot of interest. You can see part 1, part 2, and part 3 if you want to learn about classification, clustering, regression, and so on. In that series I was careful to differentiate between a general technique and a specific algorithm like decision trees. Classification, for example, is a general technique used to identify members of a known class like fraudulent transactions, bananas, or high value customers. Read this machine learning post if you need a refresher or are wondering quite what bananas have to do with machine learning.
Adapting to Concept Drift in Credit Card Transaction Data Streams Using Contextual Bandits and Decision Trees
Soemers, Dennis J. N. J. (Vrije Universiteit Brussel) | Brys, Tim (Vrije Universiteit Brussel) | Driessens, Kurt (Maastricht University) | Winands, Mark H. M. (Maastricht University) | Nowรฉ, Ann (Vrije Universiteit Brussel)
Credit card transactions predicted to be fraudulent by automated detection systems are typically handed over to human experts for verification. To limit costs, it is standard practice to select only the most suspicious transactions for investigation. We claim that a trade-off between exploration and exploitation is imperative to enable adaptation to changes in behavior (concept drift). Exploration consists of the selection and investigation of transactions with the purpose of improving predictive models, and exploitation consists of investigating transactions detected to be suspicious. Modeling the detection of fraudulent transactions as rewarding, we use an incremental Regression Tree learner to create clusters of transactions with similar expected rewards. This enables the use of a Contextual Multi-Armed Bandit (CMAB) algorithm to provide the exploration/exploitation trade-off. We introduce a novel variant of a CMAB algorithm that makes use of the structure of this tree, and use Semi-Supervised Learning to grow the tree using unlabeled data. The approach is evaluated on a real dataset and data generated by a simulator that adds concept drift by adapting the behavior of fraudsters to avoid detection. It outperforms frequently used offline models in terms of cumulative rewards, in particular in the presence of concept drift.
MDP-Based Cost Sensitive Classification Using Decision Trees
Maliah, Shlomi (Ben Gurion University) | Shani, Guy (Ben Gurion University)
In classification, an algorithm learns to classify a given instance based on a set of observed attribute values. In many real world cases testing the value of an attribute incurs a cost. Furthermore, there can also be a cost associated with the misclassification of an instance. Cost sensitive classification attempts to minimize the expected cost of classification, by deciding after each observed attribute value, which attribute to measure next. In this paper we suggest Markov Decision Processes as a modeling tool for cost sensitive classification. We construct standard decision trees over all attribute subsets, and the leaves of these trees become the state space of our MDP. At each phase we decide on the next attribute to measure, balancing the cost of the measurement and the classification accuracy. We compare our approach to a set of previous approaches, showing our approach to work better for a range of misclassification costs.
Differential Performance Debugging With Discriminant Regression Trees
Tizpaz-Niari, Saeid (University of Colorado Boulder) | Cerny, Pavol (University of Colorado Boulder) | Chang, Bor-Yuh Evan (University of Colorado Boulder) | Trivedi, Ashutosh (University of Colorado Boulder)
Differential performance debugging is a technique to find performance problems. It applies in situations where the performance of a program is (unexpectedly) different for different classes of inputs. The task is to explain the differences in asymptotic performance among various input classes in terms of program internals. We propose a data-driven technique based on discriminant regression tree (DRT) learning problem where the goal is to discriminate among different classes of inputs. We propose a new algorithm for DRT learning that first clusters the data into functional clusters, capturing different asymptotic performance classes, and then invokes off-the-shelf decision tree learning algorithms to explain these clusters. We focus on linear functional clusters and adapt classical clustering algorithms (K-means and spectral) to produce them. For the K-means algorithm, we generalize the notion of the cluster centroid from a point to a linear function. We adapt spectral clustering by defining a novel kernel function to capture the notion of linear similarity between two data points. We evaluate our approach on benchmarks consisting of Java programs where we are interested in debugging performance. We show that our algorithm significantly outperforms other well-known regression tree learning algorithms in terms of running time and accuracy of classification.
Estimating the Class Prior in Positive and Unlabeled Data Through Decision Tree Induction
Bekker, Jessa (KU Leuven) | Davis, Jesse (KU Leuven)
For tasks such as medical diagnosis and knowledge base completion, a classifier may only have access to positive and unlabeled examples, where the unlabeled data consists of both positive and negative examples. One way that enables learning from this type of data is knowing the true class prior. In this paper, we propose a simple yet effective method for estimating the class prior, by estimating the probability that a positive example is selected to be labeled. Our key insight is that subdomains of the data give a lower bound on this probability. This lower bound gets closer to the real probability as the ratio of labeled examples increases. Finding such subsets can naturally be done via top-down decision tree induction. Experiments show that our method makes estimates which are equivalently accurate as those of the state of the art methods, and is an order of magnitude faster.
Death vs. Data Science: Predicting End of Life
Ahmad, Muhammad A. (KenSci Inc.) | Eckert, Carly (KenSci Inc.) | McKelvey, Greg (KenSci Inc.) | Zolfagar, Kiyana (KenSci Inc.) | Zahid, Anam (KenSci Inc.) | Teredesai, Ankur (KenSci Inc.)
Death is an inevitable part of life and while it cannot be delayed indefinitely it is possible to predict with some certainty when the health of a person is going to deteriorate. In this paper, we predict risk of mortality for patients from two large hospital systems in the Pacific Northwest. Using medical claims and electronic medical records (EMR) data we greatly improve prediction for risk of mortality and explore machine learning models with explanations for end of life predictions. The insights that are derived from the predictions can then be used to improve the quality of patient care towards the end of life.