AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Node harvest

Meinshausen, Nicolai

arXiv.org Machine LearningJan-7-2011

When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy. To give a classical example, classification and regression trees are easy to understand and interpret. Tree ensembles like Random Forests provide usually more accurate predictions. Yet tree ensembles are also more difficult to analyze than single trees and are often criticized, perhaps unfairly, as `black box' predictors. Node harvest is trying to reconcile the two aims of interpretability and predictive accuracy by combining positive aspects of trees and tree ensembles. Results are very sparse and interpretable and predictive accuracy is extremely competitive, especially for low signal-to-noise data. The procedure is simple: an initial set of a few thousand nodes is generated randomly. If a new observation falls into just a single node, its prediction is the mean response of all training observation within this node, identical to a tree-like prediction. A new observation falls typically into several nodes and its prediction is then the weighted average of the mean responses across all these nodes. The only role of node harvest is to `pick' the right nodes from the initial large ensemble of nodes by choosing node weights, which amounts in the proposed algorithm to a quadratic programming problem with linear inequality constraints. The solution is sparse in the sense that only very few nodes are selected with a nonzero weight. This sparsity is not explicitly enforced. Maybe surprisingly, it is not necessary to select a tuning parameter for optimal predictive accuracy. Node harvest can handle mixed data and missing values and is shown to be simple to interpret and competitive in predictive accuracy on a variety of data sets.

artificial intelligence, machine learning, node, (18 more...)

arXiv.org Machine Learning

doi: 10.1214/10-AOAS367

0910.2145

Country:

North America > United States > California > Orange County > Irvine (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Wisconsin (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

A Theory of Multiclass Boosting

Mukherjee, Indraneel, Schapire, Robert E.

Neural Information Processing SystemsDec-31-2010

Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the "correct" requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements.

algorithm, classifier, weak-learning condition, (16 more...)

Neural Information Processing Systems

Country: North America > United States > New Jersey > Mercer County > Princeton (0.04)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

Add feedback

A Theory of Multiclass Boosting

Mukherjee, Indraneel, Schapire, Robert E.

Neural Information Processing SystemsDec-31-2010

algorithm, classifier, weak-learning condition, (16 more...)

Neural Information Processing Systems

Country: North America > United States > New Jersey > Mercer County > Princeton (0.04)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

Add feedback

(RF)^2 -- Random Forest Random Field

Payet, Nadia, Todorovic, Sinisa

Neural Information Processing SystemsDec-31-2010

We combine random forest (RF) and conditional random field (CRF) into a new computational framework, called random forest random field (RF)^2. Inference of (RF)^2 uses the Swendsen-Wang cut algorithm, characterized by Metropolis-Hastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a non-parametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF)^2. (RF)^2 is applied to a challenging task of multiclass object recognition and segmentation over a random field of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF)^2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time.

decision tree, inference, posterior distribution, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Oregon (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.81)

Add feedback

A Theory of Multiclass Boosting

Mukherjee, Indraneel, Schapire, Robert E.

Neural Information Processing SystemsDec-31-2010

artificial intelligence, machine learning, weak-learning condition, (18 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

Add feedback

Multivariate Dyadic Regression Trees for Sparse Learning Problems

Liu, Han, Chen, Xi

Neural Information Processing SystemsDec-31-2010

We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classification and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsity-inducing penalty. Theoretically, we show that MDRTs can simultaneously adapt to the unknown sparsity and smoothness of the true regression functions, and achieve the nearly optimal rates of convergence (in a minimax sense) for the class of $(\alpha, C)$-smooth functions. Empirically, MDRTs can simultaneously conduct function estimation and variable selection in high dimensions. To make MDRTs applicable for large-scale learning problems, we propose a greedy heuristics. The superior performance of MDRTs are demonstrated on both synthetic and real datasets.

artificial intelligence, machine learning, mdrt, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Industry: Education > Focused Education > Special Education (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Graph-Valued Regression

Liu, Han, Chen, Xi, Wasserman, Larry, Lafferty, John D.

Neural Information Processing SystemsDec-31-2010

Undirected graphical models encode in a graph $G$ the dependency structure of a random vector $Y$. In many applications, it is of interest to model $Y$ given another random vector $X$ as input. We refer to the problem of estimating the graph $G(x)$ of $Y$ conditioned on $X=x$ as ``graph-valued regression''. In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph. We call the method ``Graph-optimized CART'', or Go-CART. We study the theoretical properties of Go-CART using dyadic partitioning trees, establishing oracle inequalities on risk minimization and tree partition consistency. We also demonstrate the application of Go-CART to a meteorological dataset, showing how graph-valued regression can provide a useful tool for analyzing complex data.

artificial intelligence, machine learning, partition, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Toward Fast Mapping for Robot Adjective Learning

Petrosino, Allison (Wellesley College) | Gold, Kevin (Rochester Institute of Technology)

AAAI ConferencesNov-5-2010

Fast mapping is a phenomenon by which children learn the meanings of novel adjectives after a very small number of exposures when the new word is contrasted with a known word. The present study was a preliminary test of whether machine learners could use such contrasts in unconstrained speech to learn adjective meanings and categories. Six decision tree-based learning methods were evaluated that use contrasting examples in order to work toward an adjective fast-mapping system for machine learners. Subjects tended to compare objects using adjectives of the same category, implying that such contrasts may be a useful source of data about adjective meaning, though none of the learning algorithms showed strong advantages over any other.

adjective, category, participant, (16 more...)

AAAI Conferences

2010 AAAI Fall Symposium Series

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > New Jersey (0.04)
(5 more...)

Industry: Education (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)

Add feedback

Significance of Classification Techniques in Prediction of Learning Disabilities

Balakrishnan, Julie M. David And Kannan

arXiv.org Artificial IntelligenceNov-2-2010

The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified.

artificial intelligence, expert system, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.5121/ijaia.2010.1409

1011.0628

Country:

North America > United States (0.14)
Asia > India > Kerala (0.05)
Europe > Slovenia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education > Focused Education > Special Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.93)
(2 more...)

Add feedback

Machine Learning Approaches for Modeling Spammer Behavior

Islam, Md. Saiful, Mahmud, Abdullah Al, Islam, Md. Rafiqul

arXiv.org Artificial IntelligenceAug-25-2010

Spam is commonly known as unsolicited or unwanted email messages in the Internet causing potential threat to Internet Security. Users spend a valuable amount of time deleting spam emails. More importantly, ever increasing spam emails occupy server storage space and consume network bandwidth. Keyword-based spam email filtering strategies will eventually be less successful to model spammer behavior as the spammer constantly changes their tricks to circumvent these filters. The evasive tactics that the spammer uses are patterns and these patterns can be modeled to combat spam. This paper investigates the possibilities of modeling spammer behavioral patterns by well-known classification algorithms such as Na\"ive Bayesian classifier (Na\"ive Bayes), Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary experimental results demonstrate a promising detection rate of around 92%, which is considerably an enhancement of performance compared to similar spammer behavior modeling research.

artificial intelligence, machine learning, spam filtering, (16 more...)

arXiv.org Artificial Intelligence

1008.4249

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report > New Finding (0.35)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)

Add feedback