AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

A Performance Comparison of Data Mining Algorithms Based Intrusion Detection System for Smart Grid

Mrabet, Zakaria El, Ghazi, Hassan El, Kaabouch, Naima

arXiv.org Machine LearningDec-31-2019

Smart grid is an emerging and promising technology. It uses the power of information technologies to deliver intelligently the electrical power to customers, and it allows the integration of the green technology to meet the environmental requirements. Unfortunately, information technologies have its inherent vulnerabilities and weaknesses that expose the smart grid to a wide variety of security risks. The Intrusion detection system (IDS) plays an important role in securing smart grid networks and detecting malicious activity, yet it suffers from several limitations. Many research papers have been published to address these issues using several algorithms and techniques. Therefore, a detailed comparison between these algorithms is needed. This paper presents an overview of four data mining algorithms used by IDS in Smart Grid. An evaluation of performance of these algorithms is conducted based on several metrics including the probability of detection, probability of false alarm, probability of miss detection, efficiency, and processing time. Results show that Random Forest outperforms the other three algorithms in detecting attacks with higher probability of detection, lower probability of false alarm, lower probability of miss detection, and higher accuracy.

algorithm, probability, random forest, (13 more...)

arXiv.org Machine Learning

doi: 10.1109/EIT.2019.8834255

2001.00917

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > North Dakota (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

Smell Pittsburgh: Engaging Community Citizen Science for Air Quality

Hsu, Yen-Chia, Cross, Jennifer, Dille, Paul, Tasota, Michael, Dias, Beatrice, Sargent, Randy, Huang, Ting-Hao 'Kenneth', Nourbakhsh, Illah

arXiv.org Artificial IntelligenceDec-30-2019

Urban air pollution has been linked to various human health concerns, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequently concentrated. All smell report data are publicly accessible online. These reports are also sent to the local health department and visualized on a map along with air quality data from monitoring stations. This visualization provides a comprehensive overview of the local pollution landscape. Additionally, with these reports and air quality data, we developed a model to predict upcoming smell events and send push notifications to inform communities. We also applied regression analysis to identify statistically significant effects of push notifications on user engagement. Our evaluation of this system demonstrates that engaging residents in documenting their experiences with pollution odors can help identify local air pollution patterns, and can empower communities to advocate for better air quality. All citizen-contributed smell data are publicly accessible and can be downloaded from https://smellpgh.org.

interactive intelligent system, notification, pittsburgh, (11 more...)

arXiv.org Artificial Intelligence

1912.11936

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law > Environmental Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Public Health (0.88)
(2 more...)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)
(4 more...)

Add feedback

The Application of Machine Learning Techniques for Predicting Results in Team Sport: A Review

Bunker, Rory, Susnjak, Teo

arXiv.org Machine LearningDec-25-2019

Over the past two decades, Machine Learning (ML) techniques have been increasingly utilized for the purpose of predicting outcomes in sport. In this paper, we provide a review of studies that have used ML for predicting results in team sport, covering studies from 1996 to 2019. We sought to answer five key research questions while extensively surveying papers in this field. This paper offers insights into which ML algorithms have tended to be used in this field, as well as those that are beginning to emerge with successful outcomes. Our research highlights defining characteristics of successful studies and identifies robust strategies for evaluating accuracy results in this application domain. Our study considers accuracies that have been achieved across different sports and explores the notion that outcomes of some team sports could be inherently more difficult to predict than others. Finally, our study uncovers common themes of future research directions across all surveyed papers, looking for gaps and opportunities, while proposing recommendations for future researchers in this domain.

accuracy, deep learning, soccer, (23 more...)

arXiv.org Machine Learning

1912.11762

Country:

Europe (0.45)
Oceania (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Sports > Football (1.00)
Leisure & Entertainment > Sports > Basketball (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)
(4 more...)

Add feedback

r/MachineLearning - [D] Decision Tree Splitting strategy

#artificialintelligenceDec-24-2019, 13:39:34 GMT

I have a dataset with 4 categorical features (Cholesterol, Systolic Blood pressure, diastolic blood pressure, and smoking rate). I use a decision tree classifier to find the probability of stroke. I am trying to verify my understanding of the splitting procedure done by Python Sklearn. Since it is a binary tree, there are three possible ways to split the first feature which is either to group categories {0 and 1 to a leaf, 2 to another leaf} or {0 and 2, 1}, or {0, 1 and 2}. What I know (please correct me here) is that the chosen split is the one with the highest information gain.

decision tree splitting strategy, information gain, machinelearning, (1 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)

Add feedback

A Study of the Learnability of Relational Properties (Model Counting Meets Machine Learning)

Usman, Muhammad, Wang, Wenxi, Wang, Kaiyuan, Vasic, Marko, Vikalo, Haris, Khurshid, Sarfraz

arXiv.org Artificial IntelligenceDec-24-2019

Relational properties, e.g., the connectivity structure of nodes in a distributed system, have many applications in software design and analysis. However, such properties often have to be written manually, which can be costly and error-prone. This paper introduces the MCML approach for empirically studying the learnability of a key class of such properties that can be expressed in the well-known software design language Alloy. A key novelty of MCML is quantification of the performance of and semantic differences among trained machine learning (ML) models, specifically decision trees, with respect to entire input spaces (up to a bound on the input size), and not just for given training and test datasets (as is the common practice). MCML reduces the quantification problems to the classic complexity theory problem of model counting, and employs state-of-the-art approximate and exact model counters for high efficiency. The results show that relatively simple ML models can achieve surprisingly high performance (accuracy and F1 score) at learning relational properties when evaluated in the common setting of using training and test datasets -- even when the training dataset is much smaller than the test dataset -- indicating the seeming simplicity of learning these properties. However, the use of MCML metrics based on model counting shows that the performance can degrade substantially when tested against the whole (bounded) input space, indicating the high complexity of precisely learning these properties, and the usefulness of model counting in quantifying the true accuracy.

dataset, decision tree, formula, (16 more...)

arXiv.org Artificial Intelligence

1912.1158

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

ADD-Lib: Decision Diagrams in Practice

Gossen, Frederik, Murtovi, Alnis, Zweihoff, Philip, Steffen, Bernhard

arXiv.org Artificial IntelligenceDec-24-2019

In the paper, we present the ADD-Lib, our efficient and easy to use framework for Algebraic Decision Diagrams (ADDs). The focus of the ADD-Lib is not so much on its efficient implementation of individual operations, which are taken by other established ADD frameworks, but its ease and flexibility, which arise at two levels: the level of individual ADD-tools, which come with a dedicated user-friendly web-based graphical user interface, and at the meta level, where such tools are specified. Both levels are described in the paper: the meta level by explaining how we can construct an ADD-tool tailored for Random Forest refinement and evaluation, and the accordingly generated Web-based domain-specific tool, which we also provide as an artifact for cooperative experimentation. In particular, the artifact allows readers to combine a given Random Forest with their own ADDs regarded as expert knowledge and to experience the corresponding effect.

algebraic structure, decision diagram, random forest, (12 more...)

arXiv.org Artificial Intelligence

1912.11308

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Middle East > Cyprus > Limassol > Limassol (0.04)
North America > United States > District of Columbia > Washington (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.48)

Add feedback

Large Random Forests: Optimisation for Rapid Evaluation

Gossen, Frederik, Steffen, Bernhard

arXiv.org Machine LearningDec-23-2019

Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise is the outcome of their predictions. However, this comes at a cost: their running time for classification grows linearly with the number of trees, i.e. the size of the forest. In this paper, we propose a method to aggregate large Random Forests into a single, semantically equivalent decision diagram. Our experiments on various popular datasets show speed-ups of several orders of magnitude, while, at the same time, also significantly reducing the size of the required data structure.

decision diagram, diagram, random forest, (14 more...)

arXiv.org Machine Learning

1912.10934

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > District of Columbia > Washington (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)

Add feedback

Interpreting Predictive Process Monitoring Benchmarks

Sindhgatta, Renuka, Ouyang, Chun, Moreira, Catarina, Liao, Yi

arXiv.org Machine LearningDec-22-2019

Predictive process analytics has recently gained significant attention, and yet its successful adoption in organisations relies on how well users can trust the predictions of the underlying machine learning algorithms that are often applied and recognised as a `black-box'. Without understanding the rationale of the black-box machinery, there will be a lack of trust in the predictions, a reluctance to use the predictions, and in the worse case, consequences of an incorrect decision based on the prediction. In this paper, we emphasise the importance of interpreting the predictive models in addition to the evaluation using conventional metrics, such as accuracy, in the context of predictive process monitoring. We review existing studies on business process monitoring benchmarks for predicting process outcomes and remaining time. We derive explanations that present the behaviour of the entire predictive model as well as explanations describing a particular prediction. These explanations are used to reveal data leakages, assess the interpretability of features used by the model, and the degree of the use of process knowledge in the existing benchmark models. Findings from this exploratory study motivate the need to incorporate interpretability in predictive process analytics.

explanation, prediction, predictive model, (14 more...)

arXiv.org Machine Learning

1912.10558

Country: Oceania > Australia > Queensland > Brisbane (0.04)

Genre:

Overview (1.00)
Research Report (0.82)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Efficient Partial Dependence Plots with decision trees

#artificialintelligenceDec-21-2019, 22:40:42 GMT

Partial Dependence Plots (PDPs) are a standard inspection technique for machine learning models. This post will describe both techniques, and explain why the fast way is… well, faster. We will also see that they are not always equivalent. We will briefly describe partial dependence functions. For a more thorough introduction to PDPs, you can refer to the Bible, or to the Interpretable Machine Learning book.

fake sample, proportion, training data, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

Robust Data Preprocessing for Machine-Learning-Based Disk Failure Prediction in Cloud Production Environments

Han, Shujie, Wu, Jun, Xu, Erci, He, Cheng, Lee, Patrick P. C., Qiang, Yi, Zheng, Qixing, Huang, Tao, Huang, Zixi, Li, Rui

arXiv.org Machine LearningDec-20-2019

To provide proactive fault tolerance for modern cloud data centers, extensive studies have proposed machine learning (ML) approaches to predict imminent disk failures for early remedy and evaluated their approaches directly on public datasets (e.g., Backblaze SMART logs). However, in real-world production environments, the data quality is imperfect (e.g., inaccurate labeling, missing data samples, and complex failure types), thereby degrading the prediction accuracy. We present RODMAN, a robust data preprocessing pipeline that refines data samples before feeding them into ML models. We start with a large-scale trace-driven study of over three million disks from Alibaba Cloud's data centers, and motivate the practical challenges in ML-based disk failure prediction. We then design RODMAN with three data preprocessing echniques, namely failure-type filtering, spline-based data filling, and automated pre-failure backtracking, that are applicable for general ML models. Evaluation on both the Alibaba and Backblaze datasets shows that RODMAN improves the prediction accuracy compared to without data preprocessing under various settings.

dataset, disk failure, failure type, (14 more...)

arXiv.org Machine Learning

1912.09722

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Services (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback