AITopics

doi: 10.1613/jair.986

AI Access Foundation

10310

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)

Elomaa, T., Kaariainen, M.

An Analysis of Reduced Error Pruning

Journal of Artificial Intelligence ResearchSep-1-2001

Top-down induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems of decision tree learning. In this paper we present analyses of Reduced Error Pruning in three different settings. First we study the basic algorithmic properties of the method, properties that hold independent of the input decision tree and pruning examples. Then we examine a situation that intuitively should lead to the subtree under consideration to be replaced by a leaf node, one in which the class label and attribute values of the pruning examples are independent of each other. This analysis is conducted under two different assumptions. The general analysis shows that the pruning probability of a node fitting pure noise is bounded by a function that decreases exponentially as the size of the tree grows. In a specific analysis we assume that the examples are distributed uniformly to the tree. This assumption lets us approximate the number of subtrees that are pruned because they do not receive any pruning examples. This paper clarifies the different variants of the Reduced Error Pruning algorithm, brings new insight to its algorithmic properties, analyses the algorithm with less imposed assumptions than before, and includes the previously overlooked empty subtrees to the analysis.

reduced error pruning

doi: 10.1613/jair.816

AI Access Foundation

10284

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Mansour, Yishay, McAllester, David A.

Boosting with Multi-Way Branching in Decision Trees

Neural Information Processing SystemsDec-31-2000

It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree algorithms, such as CART and C4.5, implement a tradeoff between the number of branches and the improvement in tree quality as measured by an index function. Here we give a boosting justification for a particular quantitative tradeoff curve. Our main theorem states, in essence, that if we require an improvement proportional to the log of the number of branches then top-down greedy construction of decision trees remains an effective boosting algorithm.

algorithm, hypothesis, procedure, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Mansour, Yishay, McAllester, David A.

Boosting with Multi-Way Branching in Decision Trees

Neural Information Processing SystemsDec-31-2000

It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree algorithms, such as CART and C4.5, implement a tradeoff between the number of branches and the improvement in tree quality as measured by an index function. Here we give a boosting justification for a particular quantitative tradeoff curve. Our main theorem states, in essence, that if we require an improvement proportional to the log of the number of branches then top-down greedy construction of decision trees remains an effective boosting algorithm.

algorithm, hypothesis, procedure, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Mansour, Yishay, McAllester, David A.

Boosting with Multi-Way Branching in Decision Trees

Neural Information Processing SystemsDec-31-2000

It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree algorithms, suchas CART and C4.5, implement a tradeoff between the number of branches and the improvement in tree quality as measured by an index function. Here we give a boosting justification fora particular quantitative tradeoff curve. Our main theorem states, in essence, that if we require an improvement proportional to the log of the number of branches then top-down greedy construction ofdecision trees remains an effective boosting algorithm.

artificial intelligence, hypothesis, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Held, Marcus, Buhmann, Joachim M.

Unsupervised On-line Learning of Decision Trees for Hierarchical Data Analysis

An adaptive online algorithm is proposed to estimate hierarchical data structures for non-stationary data sources. The approach is based on the principle of minimum cross entropy to derive a decision tree for data clustering and it employs a metalearning idea (learning to learn) to adapt to changes in data characteristics. Its efficiency is demonstrated by grouping non-stationary artifical data and by hierarchical segmentation of LANDSAT images. 1 Introduction Unsupervised learning addresses the problem to detect structure inherent in unlabeled and unclassified data. N. The encoding usually is represented by an assignment matrix M (Mia), where Mia 1 if and only if Xi belongs to cluster L: 1 MiaV (Xi, Ya) measures the quality of a data partition, Le., optimal assignments and prototypes (M,y)OPt argminM,y1i (M,Y) minimize the inhomogeneity of clusters w.r.t. a given distance measure V. For reasons of simplicity we restrict the presentation to the ' sum-of-squared-error criterion V(x, y) To facilitate this minimization a deterministic annealing approach was proposed in [5] signments, which maps the discrete optimization problem, i.e. how to determine the data as via the Maximum Entropy Principle [2] to a continuous parameter es- Unsupervised Online Learning of Decision Trees for Data Analysis 515 timation problem.

data source, learning, prototype, (13 more...)

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Instructional Material > Online (0.41)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Golea, Mostefa, Bartlett, Peter L., Lee, Wee Sun, Mason, Llew

Generalization in Decision Trees and DNF: Does Size Matter?

Recent theoretical results for pattern classification with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassification probability that do not depend on the size of the classifier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more widely applied, by representing other boolean functions as two-layer neural networks (thresholded convex combinations of boolean functions).

decision tree, misclassification probability, probability, (14 more...)

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Held, Marcus, Buhmann, Joachim M.

Unsupervised On-line Learning of Decision Trees for Hierarchical Data Analysis

An adaptive online algorithm is proposed to estimate hierarchical data structures for non-stationary data sources. The approach is based on the principle of minimum cross entropy to derive a decision tree for data clustering and it employs a metalearning idea (learning to learn) to adapt to changes in data characteristics. Its efficiency is demonstrated by grouping non-stationary artifical data and by hierarchical segmentation of LANDSAT images. 1 Introduction Unsupervised learning addresses the problem to detect structure inherent in unlabeled and unclassified data. N. The encoding usually is represented by an assignment matrix M (Mia), where Mia 1 if and only if Xi belongs to cluster L: 1 MiaV (Xi, Ya) measures the quality of a data partition, Le., optimal assignments and prototypes (M,y)OPt argminM,y1i (M,Y) minimize the inhomogeneity of clusters w.r.t. a given distance measure V. For reasons of simplicity we restrict the presentation to the ' sum-of-squared-error criterion V(x, y) To facilitate this minimization a deterministic annealing approach was proposed in [5] signments, which maps the discrete optimization problem, i.e. how to determine the data as via the Maximum Entropy Principle [2] to a continuous parameter es- Unsupervised Online Learning of Decision Trees for Data Analysis 515 timation problem.

data source, learning, prototype, (13 more...)

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Instructional Material > Online (0.41)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Golea, Mostefa, Bartlett, Peter L., Lee, Wee Sun, Mason, Llew

Generalization in Decision Trees and DNF: Does Size Matter?

Recent theoretical results for pattern classification with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassification probability that do not depend on the size of the classifier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more widely applied, by representing other boolean functions as two-layer neural networks (thresholded convex combinations of boolean functions).

decision tree, misclassification probability, probability, (14 more...)

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Held, Marcus, Buhmann, Joachim M.

Unsupervised On-line Learning of Decision Trees for Hierarchical Data Analysis

An adaptive online algorithm is proposed to estimate hierarchical data structures for non-stationary data sources. The approach is based on the principle of minimum cross entropy to derive a decision tree for data clustering and it employs a metalearning idea (learning to learn) to adapt to changes in data characteristics. Its efficiency is demonstrated by grouping non-stationary artifical data and by hierarchical segmentation of LANDSAT images. 1 Introduction Unsupervised learning addresses the problem to detect structure inherent in unlabeled andunclassified data. N. The encoding usually is represented by an assignment matrix M (Mia), where Mia 1 if and only if Xi belongs to cluster L: 1 MiaV (Xi, Ya) measures the quality of a data partition, Le., optimal assignments and prototypes (M,y)OPt argminM,y1i (M,Y) minimize the inhomogeneity of clusters w.r.t. a given distance measure V. For reasons of simplicity we restrict the presentation to the ' sum-of-squared-error criterion V(x, y) To facilitate this minimization a deterministic annealing approach was proposed in [5] which maps the discrete optimization problem, i.e. how to determine the data assignments, viathe Maximum Entropy Principle [2] to a continuous parameter es- Unsupervised Online Learning ofDecision Trees for Data Analysis 515 timation problem.

artificial intelligence, data source, machine learning, (15 more...)