AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

An Explainable AI System for the Diagnosis of High Dimensional Biomedical Data

Ultsch, Alfred, Hoffmann, Jörg, Röhnert, Maximilian, Von Bonin, Malte, Oelschlägel, Uta, Brendel, Cornelia, Thrun, Michael C.

arXiv.org Artificial IntelligenceJul-5-2021

ABSTRACT Typical state of the art flow cytometry data samples consists of measures of more than 100.000 cells in 10 or more features. AI systems are able to diagnose such data with almost the same accuracy as human experts. However, there is one central challenge in such systems: their decisions have far-reaching consequences for the health and life of people, and therefore, the decisions of AI systems need to be understandable and justifiable by humans. In this work, we present a novel explainable AI method, called ALPODS, which is able to classify (diagnose) cases based on clusters, i.e., subpopulations, in the high-dimensional data. ALPODS is able to explain its decisions in a form that is understandable for human experts. For the identified subpopulations, fuzzy reasoning rules expressed in the typical language of domain experts are generated. A visualization method based on these rules allows human experts to understand the reasoning used by the AI system. A comparison to a selection of state of the art explainable AI systems shows that ALPODS operates efficiently on known benchmark data and also on everyday routine case data. KEYWORDS: Explainable AI, Expert System, Symbolic System, Biomedical Data 1. INTRODUCTION State of the art machine learning (ML) artificial intelligence (AI) algorithms are effectively and efficiently able to diagnose (classify) high-dimensional data sets in modern medicine, e.g., for multiparameter flow cytometry data [Hu et al., 2019; Zhao et al., 2020]. These are systems that, after a training (learning) phase using learning data, perform well on data that are not part of the training data, i.e., the test data. This is called supervised learning [Murphy, 2012].

algorithm, diagnosis, subpopulation, (15 more...)

arXiv.org Artificial Intelligence

2107.0182

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

The Role of "Live" in Livestreaming Markets: Evidence Using Orthogonal Random Forest

Cong, Ziwei, Liu, Jia, Manchanda, Puneet

arXiv.org Machine LearningJul-4-2021

The common belief about the growing medium of livestreaming is that its value lies in its "live" component. In this paper, we leverage data from a large livestreaming platform to examine this belief. We are able to do this as this platform also allows viewers to purchase the recorded version of the livestream. We summarize the value of livestreaming content by estimating how demand responds to price before, on the day of, and after the livestream. We do this by proposing a generalized Orthogonal Random Forest framework. This framework allows us to estimate heterogeneous treatment effects in the presence of high-dimensional confounders whose relationships with the treatment policy (i.e., price) are complex but partially known. We find significant dynamics in the price elasticity of demand over the temporal distance to the scheduled livestreaming day and after. Specifically, demand gradually becomes less price sensitive over time to the livestreaming day and is inelastic on the livestreaming day. Over the post-livestream period, demand is still sensitive to price, but much less than the pre-livestream period. This indicates that the vlaue of livestreaming persists beyond the live component. Finally, we provide suggestive evidence for the likely mechanisms driving our results. These are quality uncertainty reduction for the patterns pre- and post-livestream and the potential of real-time interaction with the creator on the day of the livestream.

consumer, creator, price elasticity, (15 more...)

arXiv.org Machine Learning

2107.01629

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Michigan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Retail (0.46)
Information Technology > Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Research on Metro Service Quality Improvement Schemes Considering Feasibility

Weiya, Chen, Jiajia, Li, Zixuan, Kang

arXiv.org Artificial IntelligenceJul-3-2021

It is an important management task of metro agencies to formulate reasonable improvement schemes based on the result of service quality surveys. Considering scores, weights, and improvement feasibility of service quality attributes in a certain period, this paper integrates Decision Tree (DT) into Importance-Performance analysis (IPA) to build a DT-IPA model, which is used to determine the improvement priority of attributes, and to quantify the improvement degree. If-then rules extracted from the optimal decision tree and the improvement feasibility computed by analytic hierarchy process are two main items derived from the DT-IPA model. They are used to optimize the initial improvement priority of attributes determined by IPA and to quantify the degree of improvement of the adjusted attributes. Then, the overall service quality can reach a high score, realizing the operation goal. The effectiveness of the DT-IPA model was verified through an empirical study which was taken place in Changsha Metro, China. The proposed method can be a decision-making tool for metro agency managers to improve the quality of metro service.

importance-performance analysis, metro service quality improvement scheme, south china university, (7 more...)

arXiv.org Artificial Intelligence

2107.05558

Country:

Europe > Italy (0.14)
Asia > China > Beijing > Beijing (0.06)
North America > United States (0.05)
(4 more...)

Genre: Research Report (0.40)

Industry: Transportation > Ground > Rail (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.55)

Add feedback

Decision tree heuristics can fail, even in the smoothed setting

Blanc, Guy, Lange, Jane, Qiao, Mingda, Tan, Li-Yang

arXiv.org Machine LearningJul-2-2021

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets $f$ that are $k$-juntas, they showed that these heuristics successfully learn $f$ with depth-$k$ decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-$k$ decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-$k$ decision trees and show that even in the smoothed setting, these heuristics build trees of depth $2^{\Omega(k)}$ before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to $k$-juntas, for which these heuristics build trees of depth $2^{\Omega(k)}$ before achieving high accuracy.

decision tree, memory bit, product distribution, (15 more...)

arXiv.org Machine Learning

2107.00819

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Fair Decision Rules for Binary Classification

Lawless, Connor, Gunluk, Oktay

arXiv.org Artificial IntelligenceJul-2-2021

In recent years, machine learning has begun automating decision making in fields as varied as college admissions, credit lending, and criminal sentencing. The socially sensitive nature of some of these applications together with increasing regulatory constraints has necessitated the need for algorithms that are both fair and interpretable. In this paper we consider the problem of building Boolean rule sets in disjunctive normal form (DNF), an interpretable model for binary classification, subject to fairness constraints. We formulate the problem as an integer program that maximizes classification accuracy with explicit constraints on two different measures of classification parity: equality of opportunity and equalized odds. Column generation framework, with a novel formulation, is used to efficiently search over exponentially many possible rules. When combined with faster heuristics, our method can deal with large data-sets. Compared to other fair and interpretable classifiers, our method is able to find rule sets that meet stricter notions of fairness with a modest trade-off in accuracy.

constraint, formulation, hamming loss, (15 more...)

arXiv.org Artificial Intelligence

2107.01325

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law (0.66)
Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

How to learn machine learning in just 10 days

#artificialintelligenceJul-1-2021, 16:45:52 GMT

today in this article we would talk about how we can learn machine learning in just 10 days well its sounds easy but we will have to make efforts to complete the goal in the very first days, we…

breast cancer, just 10, learn machine, (2 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

Add feedback

Towards a fairer reimbursement system for burn patients using cost-sensitive classification

Onah, Chimdimma Noelyn, Allmendinger, Richard, Handl, Julia, Dunn, Ken W.

arXiv.org Machine LearningJul-1-2021

The adoption of the Prospective Payment System (PPS) in the UK National Health Service (NHS) has led to the creation of patient groups called Health Resource Groups (HRG). HRGs aim to identify groups of clinically similar patients that share similar resource usage for reimbursement purposes. These groups are predominantly identified based on expert advice, with homogeneity checked using the length of stay (LOS). However, for complex patients such as those encountered in burn care, LOS is not a perfect proxy of resource usage, leading to incomplete homogeneity checks. To improve homogeneity in resource usage and severity, we propose a data-driven model and the inclusion of patient-level costing. We investigate whether a data-driven approach that considers additional measures of resource usage can lead to a more comprehensive model. In particular, a cost-sensitive decision tree model is adopted to identify features of importance and rules that allow for a focused segmentation on resource usage (LOS and patient-level cost) and clinical similarity (severity of burn). The proposed approach identified groups with increased homogeneity compared to the current HRG groups, allowing for a more equitable reimbursement of hospital care costs if adopted.

burn patient, dshealth, homogeneity, (12 more...)

arXiv.org Machine Learning

2107.00531

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.14)
Europe > United Kingdom > Wales (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Health Care Providers & Services (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Efficient Detection of Botnet Traffic by features selection and Decision Trees

Velasco-Mata, Javier, González-Castro, Víctor, Fidalgo, Eduardo, Alegre, Enrique

arXiv.org Artificial IntelligenceJun-30-2021

Botnets are one of the online threats with the biggest presence, causing billionaire losses to global economies. Nowadays, the increasing number of devices connected to the Internet makes it necessary to analyze large amounts of network traffic data. In this work, we focus on increasing the performance on botnet traffic classification by selecting those features that further increase the detection rate. For this purpose we use two feature selection techniques, Information Gain and Gini Importance, which led to three pre-selected subsets of five, six and seven features. Then, we evaluate the three feature subsets along with three models, Decision Tree, Random Forest and k-Nearest Neighbors. To test the performance of the three feature vectors and the three models we generate two datasets based on the CTU-13 dataset, namely QB-CTU13 and EQB-CTU13. We measure the performance as the macro averaged F1 score over the computational time required to classify a sample. The results show that the highest performance is achieved by Decision Trees using a five feature set which obtained a mean F1 score of 85% classifying each sample in an average time of 0.78 microseconds.

dataset, detection, subset, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/access.2021.3108222

2107.02896

Country:

North America > United States (0.14)
South America (0.04)
North America > Central America (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Spark MLlib on AWS Glue

#artificialintelligenceJun-29-2021, 01:16:15 GMT

AWS pushes Sagemaker as its machine learning platform. However, Spark's MLlib is a comprehensive library that runs distributed ML natively on AWS Glue -- and provides a viable alternative to their primary ML platform. One of the big benefits of Sagemaker is that it easily supports experimentation via its Jupyter Notebooks. But operationalising your Sagemaker ML can be difficult, particularly if you need to include ETL processing at the start of your pipeline. In this situation, Apache Spark's MLlib running on AWS Glue can be a good option -- by its very nature, it is immediately operationalised, integrated with ETL pre-processing and ready to be used in production for an end-to-end machine learning pipeline.

aw glue, custom transform, glue, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.32)

Add feedback

Framework for an Intelligent Affect Aware Smart Home Environment for Elderly People

Thakur, Nirmalya, Han, Chia Y.

arXiv.org Artificial IntelligenceJun-29-2021

The population of elderly people has been increasing at a rapid rate over the last few decades and their population is expected to further increase in the upcoming future. Their increasing population is associated with their increasing needs due to problems like physical disabilities, cognitive issues, weakened memory and disorganized behavior, that elderly people face with increasing age. To reduce their financial burden on the world economy and to enhance their quality of life, it is essential to develop technology-based solutions that are adaptive, assistive and intelligent in nature. Intelligent Affect Aware Systems that can not only analyze but also predict the behavior of elderly people in the context of their day to day interactions with technology in an IoT-based environment, holds immense potential for serving as a long-term solution for improving the user experience of elderly in smart homes. This work therefore proposes the framework for an Intelligent Affect Aware environment for elderly people that can not only analyze the affective components of their interactions but also predict their likely user experience even before they start engaging in any activity in the given smart home environment. This forecasting of user experience would provide scope for enhancing the same, thereby increasing the assistive and adaptive nature of such intelligent systems. To uphold the efficacy of this proposed framework for improving the quality of life of elderly people in smart homes, it has been tested on three datasets and the results are presented and discussed.

complex activity, elderly people, user experience, (13 more...)

arXiv.org Artificial Intelligence

2106.15599

Country:

Europe > United Kingdom > England (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Ohio > Hamilton County > Cincinnati (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.66)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
(4 more...)

Add feedback