AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Probabilistic Value Selection for Space Efficient Model

Njoo, Gunarto Sindoro, Zheng, Baihua, Hsu, Kuo-Wei, Peng, Wen-Chih

arXiv.org Machine LearningJul-9-2020

An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results show that value selection can achieve the balance between accuracy and model size reduction.

evolutionary algorithm, machine learning, selection, (16 more...)

arXiv.org Machine Learning

2007.04641

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.05)
(6 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Split a Decision Tree

#artificialintelligenceJul-8-2020, 19:13:22 GMT

Decision trees are simple to implement and equally easy to interpret. And decision trees are idea for machine learning newcomers as well! If you are unsure about even one of these questions, you've come to the right place! Decision Tree is a powerful machine learning algorithm that also serves as the building block for other widely used and complicated machine learning algorithms like Random Forest, XGBoost, and LightGBM. You can imagine why it's important to learn about this topic!

artificial intelligence, decision tree learning, machine learning, (17 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

An exploration of the influence of path choice in game-theoretic attribution algorithms

Ward, Geoff, Kamkar, Sean, Budzik, Jay

arXiv.org Artificial IntelligenceJul-8-2020

We compare machine learning explainability methods based on the theory of atomic (Shapley, 1953) and infinitesimal (Aumann and Shapley, 1974) games, in a theoretical and experimental investigation into how the model and choice of integration path can influence the resulting feature attributions. To gain insight into differences in attributions resulting from interventional Shapley values (Sundararajan and Najmi, 2019; Janzing et al., 2019; Chen et al., 2019) and Generalized Integrated Gradients (GIG) (Merrill et al., 2019) we note interventional Shapley is equivalent to a multi-path integration along $n!$ paths where $n$ is the number of model input features. Applying Stoke's theorem we show that the path symmetry of these two methods results in the same attributions when the model is composed of a sum of separable functions of individual features and a sum of two-feature products. We then perform a series of experiments with varying degrees of data missingness to demonstrate how interventional Shapley's multi-path approach can yield less consistent attributions than the single straight-line path of Aumann-Shapley. We argue this is because the multiple paths employed by interventional Shaply extend away from the training data manifold and are therefore more likely to pass through regions where the model has little support. In the absence of a more meaningful path choice, we therefore advocate the straight-line path since it will almost always pass closer to the data manifold. Among straight-line path attribution algorithms, GIG is uniquely robust since it will still yield Shapley values for atomic games modeled by decision trees.

artificial intelligence, attribution, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2007.04169

Country:

North America > United States > California > Los Angeles County > Burbank (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)

Add feedback

Decision Tree vs. Random Forest - Which Algorithm Should you Use?

#artificialintelligenceJul-7-2020, 15:22:27 GMT

Let's start with a thought experiment that will illustrate the difference between a decision tree and a random forest model. Suppose a bank has to approve a small loan amount for a customer and the bank needs to make a decision quickly. The bank checks the person's credit history and their financial condition and finds that they haven't re-paid the older loan yet. Hence, the bank rejects the application. But here's the catch – the loan amount was very small for the bank's immense coffers and they could have easily approved it in a very low-risk move. Therefore, the bank lost the chance of making some money.

artificial intelligence, decision tree, machine learning, (15 more...)

#artificialintelligence

Industry: Banking & Finance > Credit (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Certifying Decision Trees Against Evasion Attacks by Program Analysis

Calzavara, Stefano, Ferrara, Pietro, Lucchese, Claudio

arXiv.org Machine LearningJul-6-2020

Machine learning has proved invaluable for a range of different tasks, yet it also proved vulnerable to evasion attacks, i.e., maliciously crafted perturbations of input data designed to force mispredictions. In this paper we propose a novel technique to verify the security of decision tree models against evasion attacks with respect to an expressive threat model, where the attacker can be represented by an arbitrary imperative program. Our approach exploits the interpretability property of decision trees to transform them into imperative programs, which are amenable for traditional program analysis techniques. By leveraging the abstract interpretation framework, we are able to soundly verify the security guarantees of decision tree models trained over publicly available datasets. Our experiments show that our technique is both precise and efficient, yielding only a minimal number of false positives and scaling up to cases which are intractable for a competitor approach.

artificial intelligence, attacker, machine learning, (17 more...)

arXiv.org Machine Learning

2007.02771

Country: Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

A Novel Random Forest Dissimilarity Measure for Multi-View Learning

Cao, Hongliu, Bernard, Simon, Sabourin, Robert, Heutte, Laurent

arXiv.org Machine LearningJul-6-2020

Multi-view learning is a learning task in which data is described by several concurrent representations. Its main challenge is most often to exploit the complementarities between these representations to help solve a classification/regression task. This is a challenge that can be met nowadays if there is a large amount of data available for learning. However, this is not necessarily true for all real-world problems, where data are sometimes scarce (e.g. problems related to the medical environment). In these situations, an effective strategy is to use intermediate representations based on the dissimilarities between instances. This work presents new ways of constructing these dissimilarity representations, learning them from data with Random Forest classifiers. More precisely, two methods are proposed, which modify the Random Forest proximity measure, to adapt it to the context of High Dimension Low Sample Size (HDLSS) multi-view classification problems. The second method, based on an Instance Hardness measurement, is significantly more accurate than other state-of-the-art measurements including the original RF Proximity measurement and the Large Margin Nearest Neighbor (LMNN) metric learning measurement.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Machine Learning

2007.02572

Country:

Europe > France > Normandy > Seine-Maritime > Rouen (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.83)

Add feedback

Boost your model's performance with these fantastic libraries

#artificialintelligenceJul-5-2020, 14:57:40 GMT

Quality is determined by Accuracy and completeness. Companies use machine learning models to make practical business decisions, and more accurate model outcomes result in better decisions. The cost of errors can be huge, but optimizing model accuracy mitigates that cost. Machine Learning model accuracy is a measurement used to determine which model is best at identifying relationships and patterns between variables in a dataset based on the input, or training data. The better a model can generalize to'unseen' data, the better predictions and insights it can produce, which in turn deliver more business value. The dataset which I have chosen is the Breast Cancer Prediction dataset.

accuracy, artificial intelligence, machine learning, (17 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Oncology (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.36)

Add feedback

Building Knowledge on the Customer Through Machine Learning

#artificialintelligenceJul-5-2020, 08:00:57 GMT

The cost of acquiring new customers is high, so companies are spending more on customer loyalty and retention. Identifying the total value generated by a customer in the entire customer life cycle would help companies in business campaigns and in other activities. So naturally Customer Relationship Management (CRM) becomes a key element of modern marketing strategies. If we can predict a score that allows us to project, on a given population, quantifiable information then it can be used by the information system (IS) to personalize the customer relationship. KDD (Knowledge Discovery and Data Mining) Cup 2009 challenge consists of three tasks, predicting the churn, appentency and upselling, through the data provided by the telecom company Orange.

classifier, data mining, machine learning, (11 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.32)

Add feedback

Why Choose Random Forest and Not Decision Trees

#artificialintelligenceJul-5-2020, 06:19:17 GMT

A decision tree is a simple tree-like structure constituting nodes and branches. At each node, data is split based on any of the input features, generating two or more branches as output. This iterative process increases the numbers of generated branches and partitions the original data. This continues until a node is generated where all or almost all of the data belong to the same class and further splits -- or branched -- are no longer possible. This whole process generates a tree-like structure.

artificial intelligence, machine learning, node, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback