Goto

Collaborating Authors

 Decision Tree Learning


#011 Machine Learning - Decision three - Master Data Science 15.08.2022

#artificialintelligence

In today's post, we are going to talk about one of the learning algorithms that are very powerful and is used in many machine learning applications. It is called decision trees and tree ensembles. It is a very powerful tool that is well worth having in your toolbox. In this post, we'll learn about decision trees and we'll see how you can apply them in your own machine learning projects. So, let's begin with our post. To explain how decision trees work, we are going to use the following cat classification example. Let's imagine that you are running a cat adoption center and given a few features, you want to train a classifier to quickly tell you if an animal is a cat or not.


Quality Diversity Evolutionary Learning of Decision Trees

arXiv.org Artificial Intelligence

Addressing the need for explainable Machine Learning has emerged as one of the most important research directions in modern Artificial Intelligence (AI). While the current dominant paradigm in the field is based on black-box models, typically in the form of (deep) neural networks, these models lack direct interpretability for human users, i.e., their outcomes (and, even more so, their inner working) are opaque and hard to understand. This is hindering the adoption of AI in safety-critical applications, where high interests are at stake. In these applications, explainable by design models, such as decision trees, may be more suitable, as they provide interpretability. Recent works have proposed the hybridization of decision trees and Reinforcement Learning, to combine the advantages of the two approaches. So far, however, these works have focused on the optimization of those hybrid models. Here, we apply MAP-Elites for diversifying hybrid models over a feature space that captures both the model complexity and its behavioral variability. We apply our method on two well-known control problems from the OpenAI Gym library, on which we discuss the "illumination" patterns projected by MAP-Elites, comparing its results against existing similar approaches.


Towards Explainable Meta-Learning for DDoS Detection

arXiv.org Artificial Intelligence

The Internet is the most complex machine humankind has ever built, and how to defense it from intrusions is even more complex. With the ever increasing of new intrusions, intrusion detection task rely on Artificial Intelligence more and more. Interpretability and transparency of the machine learning model is the foundation of trust in AI-driven intrusion detection results. Current interpretation Artificial Intelligence technologies in intrusion detection are heuristic, which is neither accurate nor sufficient. This paper proposed a rigorous interpretable Artificial Intelligence driven intrusion detection approach, based on artificial immune system. Details of rigorous interpretation calculation process for a decision tree model is presented. Prime implicant explanation for benign traffic flow are given in detail as rule for negative selection of the cyber immune system. Experiments are carried out in real-life traffic.


Machine Learning-Based Test Smell Detection

arXiv.org Artificial Intelligence

Context: Test smells are symptoms of sub-optimal design choices adopted when developing test cases. Previous studies have proved their harmfulness for test code maintainability and effectiveness. Therefore, researchers have been proposing automated, heuristic-based techniques to detect them. However, the performance of such detectors is still limited and dependent on thresholds to be tuned. Objective: We propose the design and experimentation of a novel test smell detection approach based on machine learning to detect four test smells. Method: We plan to develop the largest dataset of manually-validated test smells. This dataset will be leveraged to train six machine learners and assess their capabilities in within- and cross-project scenarios. Finally, we plan to compare our approach with state-of-the-art heuristic-based techniques.


Carbon Footprint Management with Data-Driven AI and IoT

#artificialintelligence

We have been chosen as winners at Climate Hackathon 2022 competition organized by Microsoft. The aim of this competition was to find new solutions to prevent climate change by utilizing new technologies. We entered the competition with a solution that we had already started designing and working on, but this hackathon gave us some needed urgency to finalize it. Going forward, we are ready to continue turning the proposed solution into a marketable product, that can help other companies improve their environmental sustainability. The competition had three distinct challenges, from which teams could choose one to solve.


Combining Predictions under Uncertainty: The Case of Random Decision Trees

arXiv.org Artificial Intelligence

A common approach to aggregate classification estimates in an ensemble of decision trees is to either use voting or to average the probabilities for each class. The latter takes uncertainty into account, but not the reliability of the uncertainty estimates (so to say, the "uncertainty about the uncertainty"). More generally, much remains unknown about how to best combine probabilistic estimates from multiple sources. In this paper, we investigate a number of alternative prediction methods. Our methods are inspired by the theories of probability, belief functions and reliable classification, as well as a principle that we call evidence accumulation. Our experiments on a variety of data sets are based on random decision trees which guarantees a high diversity in the predictions to be combined. Somewhat unexpectedly, we found that taking the average over the probabilities is actually hard to beat. However, evidence accumulation showed consistently better results on all but very small leafs.


Ensemble Modeling

#artificialintelligence

In the world of analytics,modeling is a general term used to refer to the use of data mining (machine learning) methods to develop predictions. If you want to know what ad a particular user is more likely to click on, or which customers are likely to leave you for a competitor, you develop a predictive model. There are a lot of models to choose from: Regression, Decision Trees, K Nearest Neighbor, Neural Nets, etc. They all will provide you with a prediction, but some will do better than others depending on the data you are working with. While there are certain tricks and tweaks one can do to improve the accuracy of these models, it never hurts to remember the fact that there is wisdom to be found in the masses.


RandomSCM: interpretable ensembles of sparse classifiers tailored for omics data

arXiv.org Artificial Intelligence

Background: Understanding the relationship between the Omics and the phenotype is a central problem in precision medicine. The high dimensionality of metabolomics data challenges learning algorithms in terms of scalability and generalization. Most learning algorithms do not produce interpretable models -- Method: We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules. -- Results : Applications on metabolomics data shows that it produces models that achieves high predictive performances. The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.


2060: Civilization, Energy, and Progression of Mankind on the Kardashev Scale

arXiv.org Artificial Intelligence

Energy has been propelling the development of human civilization for millennia, and technologies acquiring energy beyond human and animal power have been continuously advanced and transformed. In 1964, the Kardashev Scale was proposed to quantify the relationship between energy consumption and the development of civilizations. Human civilization presently stands at Type 0.7276 on this scale. Projecting the future energy consumption, estimating the change of its constituting structure, and evaluating the influence of possible technological revolutions are critical in the context of civilization development. In this study, we use two machine learning models, random forest (RF) and autoregressive integrated moving average (ARIMA), to simulate and predict energy consumption on a global scale. We further project the position of human civilization on the Kardashev Scale in 2060. The result shows that the global energy consumption is expected to reach 928-940 EJ in 2060, with a total growth of over 50% in the coming 40 years, and our civilization is expected to achieve Type 0.7474 on the Kardashev Scale, still far away from a Type 1 civilization. Additionally, we discuss the potential energy segmentation change before 2060 and present the influence of the advent of nuclear fusion in this context.


Global Evaluation for Decision Tree Learning

arXiv.org Artificial Intelligence

We transfer distances on clusterings to the building process of decision trees, and as a consequence extend the classical ID3 algorithm to perform modifications based on the global distance of the tree to the ground truth--instead of considering single leaves. Next, we evaluate this idea in comparison with the original version and discuss occurring problems, but also strengths of the global approach. On this basis, we finish by identifying other scenarios where global evaluations are worthwhile. The classification problem in machine learning asks, given some observed instances with known outcomes (called the labeled training data), to make predictions on outcomes of unseen instances. Formally, let Ω be a universe of instances. R. Outcomes of instances in the training set X Ω, also called class labels, are given by a map y: Ω {1,..., k}. One popular choice of a model to train is the decision tree.