AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

DeepSym: Deep Symbol Generation and Rule Learning for Planning from Unsupervised Robot Interaction

Ahmetoglu, Alper (Bogazici University) | Seker, M. Yunus (Bogazici University) | Piater, Justus (University of Innsbruck) | Oztop, Erhan (Osaka University, Ozyegin University) | Ugur, Emre (Bogazici University)

Journal of Artificial Intelligence ResearchNov-6-2022

Symbolic planning and reasoning are powerful tools for robots tackling complex tasks. However, the need to manually design the symbols restrict their applicability, especially for robots that are expected to act in open-ended environments. Therefore symbol formation and rule extraction should be considered part of robot learning, which, when done properly, will offer scalability, flexibility, and robustness. Towards this goal, we propose a novel general method that finds action-grounded, discrete object and effect categories and builds probabilistic rules over them for non-trivial action planning. Our robot interacts with objects using an initial action repertoire that is assumed to be acquired earlier and observes the effects it can create in the environment. To form action-grounded object, effect, and relational categories, we employ a binary bottleneck layer in a predictive, deep encoderdecoder network that takes the image of the scene and the action applied as input, and generates the resulting effects in the scene in pixel coordinates. After learning, the binary latent vector represents action-driven object categories based on the interaction experience of the robot. To distill the knowledge represented by the neural network into rules useful for symbolic reasoning, a decision tree is trained to reproduce its decoder function. Probabilistic rules are extracted from the decision paths of the tree and are represented in the Probabilistic Planning Domain Definition Language (PPDDL), allowing off-the-shelf planners to operate on the knowledge extracted from the sensorimotor experience of the robot. The deployment of the proposed approach for a simulated robotic manipulator enabled the discovery of discrete representations of object properties such as ‘rollable’ and ‘insertable’. In turn, the use of these representations as symbols allowed the generation of effective plans for achieving goals, such as building towers of the desired height, demonstrating the effectiveness of the approach for multi-step object manipulation. Finally, we demonstrate that the system is not only restricted to the robotics domain by assessing its applicability to the MNIST 8-puzzle domain in which learned symbols allow for the generation of plans that move the empty tile into any given position.

category, deep symbol generation, representation, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.13754

AI Access Foundation

13754

Journal of Artificial Intelligence Research

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Decision Trees in Python: Predicting Diabetes - Statistically Relevant

#artificialintelligenceNov-5-2022, 15:56:16 GMT

In this post, we'll be learning about decision trees, how they work and what the benefits are for using them. We'll also use this algorithm in a real-world data to predict diabetes. So, what are decision trees? Decision trees are a machine learning method for classification or regression. It works by segmenting the dataset through if-else control statements applied to the features.

algorithm, decision tree, node, (14 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Time series quantile regression using random forests

Shiraishi, Hiroshi, Nakamura, Tomoshige, Shibuki, Ryotato

arXiv.org Machine LearningNov-4-2022

We discuss an application of Generalized Random Forests (GRF) proposed by Athey et al.(2019) to quantile regression for time series data. We extracted the theoretical results of the GRF consistency for i.i.d. data to time series data. In particular, in the main theorem, based only on the general assumptions for time series data in Davis and Nielsen (2020), and trees in Athey et al.(2019), we show that the tsQRF (time series Quantile Regression Forests) estimator is consistent. Davis and Nielsen (2020) also discussed the estimation problem using Random Forests (RF) for time series data, but the construction procedure of the RF treated by the GRF is essentially different, and different ideas are used throughout the theoretical proof. In addition, a simulation and real data analysis were conducted.In the simulation, the accuracy of the conditional quantile estimation was evaluated under time series models. In the real data using the Nikkei Stock Average, our estimator is demonstrated to be more sensitive than the others in terms of volatility, thus preventing underestimation of risk.

artificial intelligence, estimator, machine learning, (15 more...)

arXiv.org Machine Learning

2211.02273

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.82)

Add feedback

Making Machine Learning Datasets and Models FAIR for HPC: A Methodology and Case Study

Lin, Pei-Hung, Liao, Chunhua, Chen, Winson, Vanderbruggen, Tristan, Emani, Murali, Xu, Hailu

arXiv.org Artificial IntelligenceNov-3-2022

The FAIR Guiding Principles aim to improve the findability, accessibility, interoperability, and reusability of digital content by making them both human and machine actionable. However, these principles have not yet been broadly adopted in the domain of machine learning-based program analyses and optimizations for High-Performance Computing (HPC). In this paper, we design a methodology to make HPC datasets and machine learning models FAIR after investigating existing FAIRness assessment and improvement techniques. Our methodology includes a comprehensive, quantitative assessment for elected data, followed by concrete, actionable suggestions to improve FAIRness with respect to common issues related to persistent identifiers, rich metadata descriptions, license and provenance information. Moreover, we select a representative training dataset to evaluate our methodology. The experiment shows the methodology can effectively improve the dataset and model's FAIRness from an initial score of 19.1% to the final score of 83.0%.

artificial intelligence, machine learning, metadata, (18 more...)

arXiv.org Artificial Intelligence

2211.02092

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Lemont (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)

Add feedback

Optimization of Oblivious Decision Tree Ensembles Evaluation for CPU

Mironov, Alexey, Khuziev, Ilnur

arXiv.org Artificial IntelligenceNov-1-2022

CatBoost is a popular machine learning library. CatBoost models are based on oblivious decision trees, making training and evaluation rapid. CatBoost has many applications, and some require low latency and high throughput evaluation. This paper investigates the possibilities for improving CatBoost's performance in single-core CPU computations. We explore the new features provided by the AVX instruction sets to optimize evaluation. We increase performance by 20-40% using AVX2 instructions without quality impact. We also introduce a new trade-off between speed and quality. Using float16 for leaf values and AVX-512 instructions, we achieve 50-70% speed-up.

artificial intelligence, catboost, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.00391

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.60)

Add feedback

HARRIS: Hybrid Ranking and Regression Forests for Algorithm Selection

Fehring, Lukas, Hanselle, Jonas, Tornede, Alexander

arXiv.org Artificial IntelligenceOct-31-2022

It is well known that different algorithms perform differently well on an instance of an algorithmic problem, motivating algorithm selection (AS): Given an instance of an algorithmic problem, which is the most suitable algorithm to solve it? As such, the AS problem has received considerable attention resulting in various approaches - many of which either solve a regression or ranking problem under the hood. Although both of these formulations yield very natural ways to tackle AS, they have considerable weaknesses. On the one hand, correctly predicting the performance of an algorithm on an instance is a sufficient, but not a necessary condition to produce a correct ranking over algorithms and in particular ranking the best algorithm first. On the other hand, classical ranking approaches often do not account for concrete performance values available in the training data, but only leverage rankings composed from such data. We propose HARRIS- Hybrid rAnking and RegRessIon foreSts - a new algorithm selector leveraging special forests, combining the strengths of both approaches while alleviating their weaknesses. HARRIS' decisions are based on a forest model, whose trees are created based on splits optimized on a hybrid ranking and regression loss function. As our preliminary experimental study on ASLib shows, HARRIS improves over standard algorithm selection approaches on some scenarios showing that combining ranking and regression in trees is indeed promising for AS.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.17341

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Connecticut > Fairfield County > Norwalk (0.04)
North America > United States > Connecticut > Fairfield County > East Norwalk (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.46)

Add feedback

Evaluation Metrics for Symbolic Knowledge Extracted from Machine Learning Black Boxes: A Discussion Paper

Sabbatini, Federico, Calegari, Roberta

arXiv.org Artificial IntelligenceOct-31-2022

As opaque decision systems are being increasingly adopted in almost any application field, issues about their lack of transparency and human readability are a concrete concern for end-users. Amongst existing proposals to associate human-interpretable knowledge with accurate predictions provided by opaque models, there are rule extraction techniques, capable of extracting symbolic knowledge out of an opaque model. However, how to assess the level of readability of the extracted knowledge quantitatively is still an open issue. Finding such a metric would be the key, for instance, to enable automatic comparison between a set of different knowledge representations, paving the way for the development of parameter autotuning algorithms for knowledge extractors. In this paper we discuss the need for such a metric as well as the criticalities of readability assessment and evaluation, taking into account the most common knowledge representations while highlighting the most puzzling issues.

artificial intelligence, knowledge, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.00238

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
(8 more...)

Genre: Research Report (0.64)

Industry:

Banking & Finance (1.00)
Transportation > Air (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

FedMint: Intelligent Bilateral Client Selection in Federated Learning with Newcomer IoT Devices

Wehbi, Osama, Arisdakessian, Sarhad, Wahab, Omar Abdel, Otrok, Hadi, Otoum, Safa, Mourad, Azzam, Guizani, Mohsen

arXiv.org Artificial IntelligenceOct-31-2022

Federated Learning (FL) is a novel distributed privacy-preserving learning paradigm, which enables the collaboration among several participants (e.g., Internet of Things devices) for the training of machine learning models. However, selecting the participants that would contribute to this collaborative training is highly challenging. Adopting a random selection strategy would entail substantial problems due to the heterogeneity in terms of data quality, and computational and communication resources across the participants. Although several approaches have been proposed in the literature to overcome the problem of random selection, most of these approaches follow a unilateral selection strategy. In fact, they base their selection strategy on only the federated server's side, while overlooking the interests of the client devices in the process. To overcome this problem, we present in this paper FedMint, an intelligent client selection approach for federated learning on IoT devices using game theory and bootstrapping mechanism. Our solution involves the design of: (1) preference functions for the client IoT devices and federated servers to allow them to rank each other according to several factors such as accuracy and price, (2) intelligent matching algorithms that take into account the preferences of both parties in their design, and (3) bootstrapping technique that capitalizes on the collaboration of multiple federated servers in order to assign initial accuracy value for the newly connected IoT devices. Based on our simulation findings, our strategy surpasses the VanillaFL selection approach in terms of maximizing both the revenues of the client devices and accuracy of the global federated learning model.

artificial intelligence, iot device, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.01805

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > Canada > Quebec > Montreal (0.05)
Africa (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.48)

Add feedback

Using Interpretable Machine Learning to Massively Increase the Number of Antibody-Virus Interactions Across Studies

Einav, Tal, Ma, Rong

arXiv.org Artificial IntelligenceOct-30-2022

Department of Statistics, Stanford University, Stanford, California, United States of America *Authors contributed equally to this work Correspondence should be addressed to teinav@fredhutch.org Abstract A central challenge in every field of biology is to use existing measurements to predict the outcomes of future experiments. In this work, we consider the wealth of antibody inhibition data against variants of the influenza virus. Due to this virus's genetic diversity and evolvability, the variants examined in one study will often have little-to-no overlap with other studies, making it difficult to discern common patterns or unify datasets for further analysis. To that end, we develop a computational framework that predicts how an antibody or serum would inhibit any variant from any other study. We use this framework to greatly expand seven influenza datasets utilizing hemagglutination inhibition, validating our method upon 200,000 existing measurements and predicting 2,000,000 new values uncertainties. With these new values, we quantify the transferability between seven vaccination and infection studies in humans and ferrets, show that the serum potency is negatively correlated with breadth, and present a tool for pandemic preparedness. This data-driven approach does not require any information beyond each virus's name and measurements, and even datasets with as few as 5 viruses can be expanded, making this approach widely applicable. Future influenza studies using hemagglutination inhibition can directly utilize our curated datasets to predict newly measured antibody responses against 80 H3N2 influenza viruses from 1968-2011, whereas immunological studies utilizing other viruses or a different assay only need a single partially-overlapping dataset to extend their work. In essence, this approach enables a shift in perspective when analyzing data from "what you see is what you get" into "what anyone sees is what everyone gets." Introduction Our understanding of how antibody-mediated immunity drives viral evolution and escape relies upon painstaking measurements of antibody binding, inhibition, or neutralization against variants of concern (Petrova and Russell, 2017). Every interaction is unique because: (1) the antibody response (serum) changes even in the absence of viral exposure and (2) for rapidly evolving viruses such as influenza, the specific variants examined in one study will often have little-to-no overlap with other studies (Figure 1).

artificial intelligence, machine learning, virus, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.crmeth.2023.100540

2206.14566

Country:

North America > United States > California > Santa Clara County > Stanford (0.24)
Asia > Vietnam > Hanoi > Hanoi (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.96)

Add feedback

Explainable Predictive Decision Mining for Operational Support

Park, Gyunam, Küsters, Aaron, Tews, Mara, Pitsch, Cameron, Schneider, Jonathan, van der Aalst, Wil M. P.

arXiv.org Artificial IntelligenceOct-30-2022

Several decision points exist in business processes (e.g., whether a purchase order needs a manager's approval or not), and different decisions are made for different process instances based on their characteristics (e.g., a purchase order higher than e500 needs a manager approval). Decision mining in process mining aims to describe/predict the routing of a process instance at a decision point of the process. By predicting the decision, one can take proactive actions to improve the process. For instance, when a bottleneck is developing in one of the possible decisions, one can predict the decision and bypass the bottleneck. However, despite its huge potential for such operational support, existing techniques for decision mining have focused largely on describing decisions but not on predicting them, deploying decision trees to produce logical expressions to explain the decision. In this work, we aim to enhance the predictive capability of decision mining to enable proactive operational support by deploying more advanced machine learning algorithms. Our proposed approach provides explanations of the predicted decisions using SHAP values to support the elicitation of proactive actions. We have implemented a Web application to support the proposed approach and evaluated the approach using the implementation.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.16786

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
Europe > Germany (0.04)
(4 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback