AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Some Open Problems in Optimal AdaBoost and Decision Stumps

Belanich, Joshua, Ortiz, Luis E.

arXiv.org Machine LearningMay-26-2015

The significance of the study of the theoretical and practical properties of AdaBoost is unquestionable, given its simplicity, wide practical use, and effectiveness on real-world datasets. Here we present a few open problems regarding the behavior of "Optimal AdaBoost," a term coined by Rudin, Daubechies, and Schapire in 2004 to label the simple version of the standard AdaBoost algorithm in which the weak learner that AdaBoost uses always outputs the weak classifier with lowest weighted error among the respective hypothesis class of weak classifiers implicit in the weak learner. We concentrate on the standard, "vanilla" version of Optimal AdaBoost for binary classification that results from using an exponential-loss upper bound on the misclassification training error. We present two types of open problems. One deals with general weak hypotheses. The other deals with the particular case of decision stumps, as often and commonly used in practice. Answers to the open problems can have immediate significant impact to (1) cementing previously established results on asymptotic convergence properties of Optimal AdaBoost, for finite datasets, which in turn can be the start to any convergence-rate analysis; (2) understanding the weak-hypotheses class of effective decision stumps generated from data, which we have empirically observed to be significantly smaller than the typically obtained class, as well as the effect on the weak learner's running time and previously established improved bounds on the generalization performance of Optimal AdaBoost classifiers; and (3) shedding some light on the "self control" that AdaBoost tends to exhibit in practice.

artificial intelligence, decision tree learning, machine learning, (13 more...)

arXiv.org Machine Learning

1505.06999

Country: North America > United States > California > Santa Clara County (0.15)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.55)

Add feedback

Greedy Biomarker Discovery in the Genome with Applications to Antimicrobial Resistance

Drouin, Alexandre, Giguère, Sébastien, Déraspe, Maxime, Laviolette, François, Marchand, Mario, Corbeil, Jacques

arXiv.org Machine LearningMay-22-2015

The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10^7. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our results show that the SCM compares favorably in terms of sparsity and accuracy against L1 and L2 regularized Support Vector Machines and CART decision trees. Moreover, the SCM was the only algorithm that could consider the full feature space. For all other algorithms, the latter had to be filtered as a preprocessing step.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1505.06249

Country: North America > Canada > Quebec (0.15)

Genre: Research Report > New Finding (0.54)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)

Add feedback

DART: Dropouts meet Multiple Additive Regression Trees

Rashmi, K. V., Gilad-Bachrach, Ran

arXiv.org Machine LearningMay-7-2015

Multiple Additive Regression Trees (MART), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice. However, it suffers an issue which we call over-specialization, wherein trees added at later iterations tend to impact the prediction of only a few instances, and make negligible contribution towards the remaining instances. This negatively affects the performance of the model on unseen data, and also makes the model over-sensitive to the contributions of the few, initially added tress. We show that the commonly used tool to address this issue, that of shrinkage, alleviates the problem only to a certain extent and the fundamental issue of over-specialization still remains. In this work, we explore a different approach to address the problem that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks. We propose a novel way of employing dropouts in MART, resulting in the DART algorithm. We evaluate DART on ranking, regression and classification tasks, using large scale, publicly available datasets, and show that DART outperforms MART in each of the tasks, with a significant margin. We also show that DART overcomes the issue of over-specialization to a considerable extent.

artificial intelligence, ensemble, machine learning, (20 more...)

arXiv.org Machine Learning

1505.01866

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.85)

Add feedback

Random Forests Can Hash

Qiu, Qiang, Sapiro, Guillermo, Bronstein, Alex

arXiv.org Machine LearningApr-16-2015

Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first time how random forest, a technique that together with deep learning have shown spectacular results in classification, can also be extended to large-scale retrieval. Traditional random forest fails to enforce the consistency of hashes generated from each tree for the same class data, i.e., to preserve the underlying similarity, and it also lacks a principled way for code aggregation across trees. We start with a simple hashing scheme, where independently trained random trees in a forest are acting as hashing functions. We the propose a subspace model as the splitting function, and show that it enforces the hash consistency in a tree for data from the same class. We also introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. Experiments on large-scale public datasets are presented, showing that the proposed approach significantly outperforms state-of-the-art hashing methods for retrieval tasks.

artificial intelligence, machine learning, random forest, (16 more...)

arXiv.org Machine Learning

1412.5083

Country: North America > Canada (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

HHCART: An Oblique Decision Tree

Wickramarachchi, D. C., Robertson, B. L., Reale, M., Price, C. J., Brown, J.

arXiv.org Machine LearningApr-14-2015

Decision trees are a popular technique in statistical data classification. They recursively partition the feature space into disjoint sub-regions until each sub-region becomes homogeneous with respect to a particular class. The basic Classification and Regression Tree (CART) algorithm partitions the feature space using axis parallel splits. When the true decision boundaries are not aligned with the feature axes, this approach can produce a complicated boundary structure. Oblique decision trees use oblique decision boundaries to potentially simplify the boundary structure. The major limitation of this approach is that the tree induction algorithm is computationally expensive. In this article we present a new decision tree algorithm, called HHCART. The method utilizes a series of Householder matrices to reflect the training data at each node during the tree construction. Each reflection is based on the directions of the eigenvectors from each classes' covariance matrix. Considering axis parallel splits in the reflected training data provides an efficient way of finding oblique splits in the unreflected training data. Experimental results show that the accuracy and size of the HHCART trees are comparable with some benchmark methods in the literature. The appealing feature of HHCART is that it can handle both qualitative and quantitative features in the same oblique split.

artificial intelligence, hhcart, machine learning, (18 more...)

arXiv.org Machine Learning

1504.03415

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

A Boosted Multi-Task Model for Pedestrian Detection with Occlusion Handling

Zhu, Chao (Peking University) | Peng, Yuxin (Peking University)

AAAI ConferencesMar-6-2015

Pedestrian detection is a challenging problem in computer vision. Especially, a major bottleneck for current state-of-the-art methods is the significant performance decline with increasing occlusion. A common technique for occlusion handling is to train a set of occlusion-specific detectors and merge their results directly. These detectors are trained independently and the relationship among them is ignored. In this paper, we consider pedestrian detection in different occlusion levels as different but related problems, and propose a multi-task model to jointly consider their relatedness and differences. The proposed model adopts multi-task learning algorithm to map pedestrians in different occlusion levels to a common space, where all models corresponding to different occlusion levels are constrained to share a common set of features, and a boosted detector is then constructed to distinguish pedestrians from background. The proposed approach is evaluated on the challenging Caltech pedestrian detection benchmark, and achieves state-of-the-art results on different occlusion-specific test sets.

artificial intelligence, detection, machine learning, (18 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Asia > China > Beijing > Beijing (0.04)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.31)

Add feedback

Policy Tree: Adaptive Representation for Policy Gradient

Gupta, Ujjwal Das (University of Alberta) | Talvitie, Erik (Franklin and Marshall College) | Bowling, Michael (University of Alberta)

AAAI ConferencesMar-6-2015

Much of the focus on finding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Policy gradient algorithms, which directly represent the policy, often need fewer parameters to learn good policies. However, they typically employ a fixed parametric representation that may not be sufficient for complex domains. This paper introduces the Policy Tree algorithm, which can learn an adaptive representation of policy in the form of a decision tree over different instantiations of a base policy. Policy gradient is used both to optimize the parameters and to grow the tree by choosing splits that enable the maximum local increase in the expected return of the policy. Experiments show that this algorithm can choose genuinely helpful splits and significantly improve upon the commonly used linear Gibbs softmax policy, which we choose as our base policy.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Extending Analogical Generalization with Near-Misses

McLure, Matthew D. (Northwestern University) | Friedman, Scott E. (Smart Information Flow Technologies (SIFT)) | Forbus, Kenneth D. (Northwestern University)

AAAI ConferencesMar-6-2015

Concept learning is a central problem for cognitive systems. Generalization techniques can help organize examples by their commonalities, but comparisons with non-examples, near-misses, can provide discrimination. Early work on near-misses required hand-selected examples by a teacher who understood the learner’s internal representations. This paper introduces Analogical Learning by Integrating Generalization and Near-misses (ALIGN) and describes three key advances. First, domain-general cognitive models of analogical processes are used to handle a wider range of examples. Second, ALIGN’s analogical generalization process constructs multiple probabilistic representations per concept via clustering, and hence can learn disjunctive concepts. Finally, ALIGN uses unsupervised analogical retrieval to find its own near-miss examples. We show that ALIGN out-performs analogical generalization on two perceptual data sets: (1) hand-drawn sketches; and (2) geospatial concepts from strategy-game maps.

artificial intelligence, logic & formal reasoning, machine learning, (23 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.69)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.49)
(2 more...)

Add feedback

On the Bayes-optimality of F-measure maximizers

Waegeman, Willem, Dembczynski, Krzysztof, Jachnik, Arkadiusz, Cheng, Weiwei, Hullermeier, Eyke

arXiv.org Machine LearningMar-6-2015

The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1310.4849

Country:

Europe (0.92)
North America > United States (0.27)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
(3 more...)

Add feedback

Heteroscedastic Treed Bayesian Optimisation

Assael, John-Alexander M., Wang, Ziyu, Shahriari, Bobak, de Freitas, Nando

arXiv.org Machine LearningMar-4-2015

Optimising black-box functions is important in many disciplines, such as tuning machine learning models, robotics, finance and mining exploration. Bayesian optimisation is a state-of-the-art technique for the global optimisation of black-box functions which are expensive to evaluate. At the core of this approach is a Gaussian process prior that captures our belief about the distribution over functions. However, in many cases a single Gaussian process is not flexible enough to capture non-stationarity in the objective function. Consequently, heteroscedasticity negatively affects performance of traditional Bayesian methods. In this paper, we propose a novel prior model with hierarchical parameter learning that tackles the problem of non-stationarity in Bayesian optimisation. Our results demonstrate substantial improvements in a wide range of applications, including automatic machine learning and mining exploration.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1410.7172

Country:

North America > Canada (0.47)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.15)

Genre: Research Report > New Finding (0.68)

Industry: Materials > Metals & Mining (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback