AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Ensemble Models with Trees and Rules

Akdemir, Deniz

arXiv.org Machine LearningAug-23-2012

In this article, we have proposed several approaches for post processing a large ensemble of prediction models or rules. The results from our simulations show that the post processing methods we have considered here are promising. We have used the techniques developed here for estimation of quantitative traits from markers, on the benchmark "Bostob Housing"data set and in some simulations. In most cases, the produced models had better prediction performance than, for example, the ones produced by the random forest or the rulefit algorithms.

artificial intelligence, ensemble, machine learning, (14 more...)

arXiv.org Machine Learning

1112.3699

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.51)

Add feedback

Performance Tuning Of J48 Algorithm For Prediction Of Soil Fertility

Gholap, Jay

arXiv.org Machine LearningAug-20-2012

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use("data mining",Wikipedia). A soil test is the analysis of a soil sample to determine nutrient content, composition and other characteristics. Tests are usually performed to measure fertility and indicate deficiencies that need to be remedied ("Soil Test", Wikipedia).. In this research, soil dataset containing soil test results has been used to apply various classification techniques in data mining. Soil fertility is a crucial attribute which is considered for land evaluation, also achieving and maintaining necessary levels of fertility is important for nurturing crop production, hence this paper includes steps for building an efficient and accurate predictive model of soil fertility with the help of J48 algorithm.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Machine Learning

1208.3943

Country:

North America > United States (0.49)
Asia > India > Maharashtra > Pune (0.15)
Oceania > New Zealand > North Island > Waikato (0.15)

Genre: Research Report (0.51)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Add feedback

Learning Driver's Behavior to Improve the Acceptance of Adaptive Cruise Control

Rosenfeld, Avi (Jerusalem College of Technology) | Bareket, Zevi (University of Michigan) | Goldman, Claudia V. (General Motors Advanced Technical Center) | Kraus, Sarit (Bar-Ilan University) | LeBlanc, David J. (University of Michigan) | Tsimhoni, Omer (General Motors Advanced Technical Center)

AAAI ConferencesJul-21-2012

Adaptive Cruise Control (ACC) is a technology that allows a vehicle to automatically adjust its speed to maintain a preset distance from the vehicle in front of it based on the driver's preferences. Individual drivers have different driving styles and preferences. Current systems do not distinguish among the users. We introduce a method to combine machine learning algorithms with demographic information and expert advice into existing automated assistive systems. This method can save on the interactions between drivers and automated systems by adjusting parameters relevant to the operation of these systems based on their specific drivers and context of drive. We also learn when users tend to engage and disengage the automated system. This method sheds light on the kinds of dynamics that users develop while interacting with automation and can teach us how to improve these systems for the benefit of their users. While accepted packages such as Weka were successful in learning drivers' behavior, we found that improved learning models could be developed by adding information on drivers' demographics and a previously developed model about different driver types. We present the general methodology of our learning procedure and suggest applications of our approach to other domains as well.

accuracy, information, vehicle, (14 more...)

AAAI Conferences

Twenty-Fourth IAAI Conference

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.93)

Industry:

Transportation > Passenger (1.00)
Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)

Add feedback

Using a Critic to Promote Less Popular Candidates in a People-to-People Recommender System

Krzywicki, Alfred (University of New South Wales) | Wobcke, Wayne (University of New South Wales) | Cai, Xiongcai (University of New South Wales) | Bain, Michael (University of New South Wales) | Mahidadia, Ashesh (University of New South Wales) | Compton, Paul (University of New South Wales) | Kim, Yang Sok (University of New South Wales)

AAAI ConferencesJul-21-2012

This paper shows how to improve the recommendations of an interaction-based collaborative filtering (IBCF) recommender used in online dating. Previous work has shown that IBCF works well in this domain, although it tends to rank popular candidates highly, which leads to these users receiving a large number of contacts. We address this problem by using a Decision Tree model as a "critic" to re-rank the candidates generated by IBCF, effectively promoting less popular candidates. This method was first evaluated on historical data from a large online dating site and then trialled live on the same site by providing recommendations to a large number of users throughout a 9 week period. The live trial confirmed the consistency of the analysis on historical data and the ability of the method to generate suitable candidates over an extended period. Our recommendations gave higher success rates than those for a control group made with a baseline recommender.

interaction, recommendation, recommender, (15 more...)

AAAI Conferences

Twenty-Fourth IAAI Conference

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Genre:

Research Report > Experimental Study (0.58)
Research Report > New Finding (0.48)

Industry: Telecommunications (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Table Header Detection and Classification

Fang, Jing (Peking University) | Mitra, Prasenjit (The Pennsylvania State University) | Tang, Zhi (Peking University) | Giles, C. Lee (The Pennsylvania State University)

AAAI ConferencesJul-21-2012

In digital libraries, a table, as a specific document component as well as a condensed way to present structured and relational data, contains rich information and often the only source of .that information. In order to explore, retrieve, and reuse that data, tables should be identified and the data extracted. Table recognition is an old field of research. However, due to the diversity of table styles, the results are still far from satisfactory, and not a single algorithm performs well on all different types of tables. In this paper, we randomly take samples from the CiteSeerX to investigate diverse table styles for automatic table extraction. We find that table headers are one of the main characteristics of complex table styles. We identify a set of features that can be used to segregate headers from tabular data and build a classifier to detect table headers. Our empirical evaluation on PDF documents shows that using a Random Forest classifier achieves an accuracy of 92%.

header, machine learning, natural language, (21 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Biogeography-Based Informative Gene Selection and Cancer Classification Using SVM and Random Forests

Nikumbh, Sarvesh, Ghosh, Shameek, Jayaraman, Valadi

arXiv.org Machine LearningJul-12-2012

Microarray cancer gene expression data comprise of very high dimensions. Reducing the dimensions helps in improving the overall analysis and classification performance. We propose two hybrid techniques, Biogeography - based Optimization - Random Forests (BBO - RF) and BBO - SVM (Support Vector Machines) with gene ranking as a heuristic, for microarray gene expression analysis. This heuristic is obtained from information gain filter ranking procedure. The BBO algorithm generates a population of candidate subset of genes, as part of an ecosystem of habitats, and employs the migration and mutation processes across multiple generations of the population to improve the classification accuracy. The fitness of each gene subset is assessed by the classifiers - SVM and Random Forests. The performances of these hybrid techniques are evaluated on three cancer gene expression datasets retrieved from the Kent Ridge Biomedical datasets collection and the libSVM data repository. Our results demonstrate that genes selected by the proposed techniques yield classification accuracies comparable to previously reported algorithms.

classification, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/CEC.2012.6256127

1207.3285

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > India > Maharashtra > Pune (0.05)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.54)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
(2 more...)

Add feedback

Tracking Tetrahymena Pyriformis Cells using Decision Trees

Wang, Quan, Ou, Yan, Julius, A. Agung, Boyer, Kim L., Kim, Min Jun

arXiv.org Machine LearningJul-12-2012

Matching cells over time has long been the most difficult step in cell tracking. In this paper, we approach this problem by recasting it as a classification problem. W e construct a feature set for each cell, and compute a feature difference vector between a cell in the current frame and a cell in a previous frame. Then we determine whether the two cells represent the same cell over time by training decision trees as our binary classifiers. With the output of decision trees, we are able to formulate an assignment problem for our cell association task and solve it using a modified version of the Hungarian algorithm.

artificial intelligence, decision tree learning, machine learning, (11 more...)

arXiv.org Machine Learning

1207.3127

Country: North America > United States (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.84)

Add feedback

MOB-ESP and other Improvements in Probability Estimation

Nielsen, Rodney

arXiv.org Artificial IntelligenceJul-11-2012

A key prerequisite to optimal reasoning under uncertainty in intelligent systems is to start with good class probability estimates. This paper improves on the current best probability estimation trees (Bagged-PETs) and also presents a new ensemble-based algorithm (MOB-ESP). Comparisons are made using several benchmark datasets and multiple metrics. These experiments show that MOB-ESP outputs significantly more accurate class probabilities than either the baseline B-PETs algorithm or the enhanced version presented here (EB-PETs). These results are based on metrics closely associated with the average accuracy of the predictions. MOB-ESP also provides much better probability rankings than B-PETs. The paper further suggests how these estimation techniques can be applied in concert with a broader category of classifiers.

artificial intelligence, machine learning, probability estimate, (17 more...)

arXiv.org Artificial Intelligence

1207.4132

Country: North America > United States > Colorado > Boulder County > Boulder (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Obtaining Calibrated Probabilities from Boosting

Niculescu-Mizil, Alexandru, Caruana, Richard A.

arXiv.org Machine LearningJul-4-2012

Boosted decision trees typically yield good accuracy, precision, and ROC area. However, because the outputs from boosting are not well calibrated posterior probabilities, boosting yields poor squared error and cross-entropy. We empirically demonstrate why AdaBoost predicts distorted probabilities and examine three calibration methods for correcting this distortion: Platt Scaling, Isotonic Regression, and Logistic Correction. We also experiment with boosting using log-loss instead of the usual exponential loss. Experiments show that Logistic Correction and boosting with log-loss work well when boosting weak models such as decision stumps, but yield poor performance when boosting more complex models such as full decision trees. Platt Scaling and Isotonic Regression, however, significantly improve the probabilities predicted by

artificial intelligence, calibration, machine learning, (16 more...)

arXiv.org Machine Learning

1207.1403

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.47)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

Improved Information Gain Estimates for Decision Tree Induction

Nowozin, Sebastian

arXiv.org Machine LearningJun-18-2012

Ensembles of classification and regression trees remain popular machine learning methods because they define flexible non-parametric models that predict well and are computationally efficient both during training and testing. During induction of decision trees one aims to find predicates that are maximally informative about the prediction target. To select good predicates most approaches estimate an information-theoretic scoring function, the information gain, both for classification and regression problems. We point out that the common estimation procedures are biased and show that by replacing them with improved estimators of the discrete and the differential entropy we can obtain better decision trees. In effect our modifications yield improved predictive performance and are simple to implement in any decision tree code.

artificial intelligence, estimator, machine learning, (13 more...)

arXiv.org Machine Learning

1206.462

Country:

Europe > United Kingdom (0.46)
North America > United States (0.28)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback