Decision Tree Learning
InfoGram and Admissible Machine Learning
We have entered a new era of machine learning (ML), where the most accurate algorithm with superior predictive power may not even be deployable, unless it is admissible under the regulatory constraints. This has led to great interest in developing fair, transparent and trustworthy ML methods. The purpose of this article is to introduce a new information-theoretic learning framework (admissible machine learning) and algorithmic risk-management tools (InfoGram, L-features, ALFA-testing) that can guide an analyst to redesign off-the-shelf ML methods to be regulatory compliant, while maintaining good prediction accuracy. We have illustrated our approach using several real-data examples from financial sectors, biomedical research, marketing campaigns, and the criminal justice system.
Nanosecond machine learning event classification with boosted decision trees in FPGA for high energy physics
Hong, Tae Min, Carlson, Benjamin, Eubanks, Brandon, Racz, Stephen, Roche, Stephen, Stelzer, Joerg, Stumpp, Daniel
We present a novel implementation of classification using the machine learning / artificial intelligence method called boosted decision trees (BDT) on field programmable gate arrays (FPGA). The firmware implementation of binary classification requiring 100 training trees with a maximum depth of 4 using four input variables gives a latency value of about 10 ns, independent of the clock speed from 100 to 320 MHz in our setup. The low timing values are achieved by restructuring the BDT layout and reconfiguring its parameters. The FPGA resource utilization is also kept low at a range from 0.01% to 0.2% in our setup. A software package called fwXmachina achieves this implementation. Our intended user is an expert of custom electronics-based trigger systems in high energy physics experiments or anyone that needs decisions at the lowest latency values for real-time event classification. Two problems from high energy physics are considered, in the separation of electrons vs. photons and in the selection of vector boson fusion-produced Higgs bosons vs. the rejection of the multijet processes.
100% Off Coupon - Machine Learning & Deep Learning in Python & R
Learn how to solve real life problem using the Machine learning techniques Machine Learning models such as Linear Regression, Logistic Regression, KNN etc. Advanced Machine Learning models such as Decision trees, XGBoost, Random Forest, SVM etc. Understanding of basics of statistics and concepts of Machine Learning How to do basic statistical operations and run ML models in Python Indepth knowledge of data collection and data preprocessing for Machine Learning problem How to convert business problem into a Machine learning problem
Explain Decision Tree's Logic
This article is about how to automatic explain every logics behind every predictions from the decision tree model created by Python's Scikit-Learn. In my case, end users not only want to have prediction results. They also want to know logics behind the prediction to decide next actions. The ability of Machine Learning to predict is very useful. In some cases we also need to know the reason behind that prediction.
Trading Complexity for Sparsity in Random Forest Explanations
Audemard, Gilles, Bellart, Steve, Bounia, Louenas, Koriche, Frédéric, Lagniez, Jean-Marie, Marquis, Pierre
Random forests have long been considered as powerful model ensembles in machine learning. By training multiple decision trees, whose diversity is fostered through data and feature subsampling, the resulting random forest can lead to more stable and reliable predictions than a single decision tree. This however comes at the cost of decreased interpretability: while decision trees are often easily interpretable, the predictions made by random forests are much more difficult to understand, as they involve a majority vote over hundreds of decision trees. In this paper, we examine different types of reasons that explain "why" an input instance is classified as positive or negative by a Boolean random forest. Notably, as an alternative to sufficient reasons taking the form of prime implicants of the random forest, we introduce majoritary reasons which are prime implicants of a strict majority of decision trees. For these different abductive explanations, the tractability of the generation problem (finding one reason) and the minimization problem (finding one shortest reason) are investigated. Experiments conducted on various datasets reveal the existence of a trade-off between runtime complexity and sparsity. Sufficient reasons - for which the identification problem is DP-complete - are slightly larger than majoritary reasons that can be generated using a simple linear- time greedy algorithm, and significantly larger than minimal majoritary reasons that can be approached using an anytime P ARTIAL M AX SAT algorithm.
On the Explanatory Power of Decision Trees
Audemard, Gilles, Bellart, Steve, Bounia, Louenas, Koriche, Frédéric, Lagniez, Jean-Marie, Marquis, Pierre
Decision trees have long been recognized as models of choice in sensitive applications where interpretability is of paramount importance. In this paper, we examine the computational ability of Boolean decision trees in deriving, minimizing, and counting sufficient reasons and contrastive explanations. We prove that the set of all sufficient reasons of minimal size for an instance given a decision tree can be exponentially larger than the size of the input (the instance and the decision tree). Therefore, generating the full set of sufficient reasons can be out of reach. In addition, computing a single sufficient reason does not prove enough in general; indeed, two sufficient reasons for the same instance may differ on many features. To deal with this issue and generate synthetic views of the set of all sufficient reasons, we introduce the notions of relevant features and of necessary features that characterize the (possibly negated) features appearing in at least one or in every sufficient reason, and we show that they can be computed in polynomial time. We also introduce the notion of explanatory importance, that indicates how frequent each (possibly negated) feature is in the set of all sufficient reasons. We show how the explanatory importance of a feature and the number of sufficient reasons can be obtained via a model counting operation, which turns out to be practical in many cases. We also explain how to enumerate sufficient reasons of minimal size. We finally show that, unlike sufficient reasons, the set of all contrastive explanations for an instance given a decision tree can be derived, minimized and counted in polynomial time.
Regression Trees
This Blog assumes that the reader is familiar with the concept of Decision Trees and Regression. If not, refer to the blogs below. Having read the above blogs or Having already being familiar with the appropriate topics, you hopefully understand what is a decision tree, by now ( The one we used for classification task). A regression tree is basically a decision tree that is used for the task of regression which can be used to predict continuous valued outputs instead of discrete outputs. In Decision Trees for Classification, we saw how the tree asks right questions at the right node in order to give accurate and efficient classifications.
Desk Organization: Effect of Multimodal Inputs on Spatial Relational Learning
Rowe, Ryan, Singhal, Shivam, Yi, Daqing, Bhattacharjee, Tapomayukh, Srinivasa, Siddhartha S.
For robots to operate in a three dimensional world and interact with humans, learning spatial relationships among objects in the surrounding is necessary. Reasoning about the state of the world requires inputs from many different sensory modalities including vision ($V$) and haptics ($H$). We examine the problem of desk organization: learning how humans spatially position different objects on a planar surface according to organizational ''preference''. We model this problem by examining how humans position objects given multiple features received from vision and haptic modalities. However, organizational habits vary greatly between people both in structure and adherence. To deal with user organizational preferences, we add an additional modality, ''utility'' ($U$), which informs on a particular human's perceived usefulness of a given object. Models were trained as generalized (over many different people) or tailored (per person). We use two types of models: random forests, which focus on precise multi-task classification, and Markov logic networks, which provide an easily interpretable insight into organizational habits. The models were applied to both synthetic data, which proved to be learnable when using fixed organizational constraints, and human-study data, on which the random forest achieved over 90% accuracy. Over all combinations of $\{H, U, V\}$ modalities, $UV$ and $HUV$ were the most informative for organization. In a follow-up study, we gauged participants preference of desk organizations by a generalized random forest organization vs. by a random model. On average, participants rated the random forest models as 4.15 on a 5-point Likert scale compared to 1.84 for the random model
Efficacy of Statistical and Artificial Intelligence-based False Information Cyberattack Detection Models for Connected Vehicles
Khan, Sakib Mahmud, Comert, Gurcan, Chowdhury, Mashrur
Connected vehicles (CVs), because of the external connectivity with other CVs and connected infrastructure, are vulnerable to cyberattacks that can instantly compromise the safety of the vehicle itself and other connected vehicles and roadway infrastructure. One such cyberattack is the false information attack, where an external attacker injects inaccurate information into the connected vehicles and eventually can cause catastrophic consequences by compromising safety-critical applications like the forward collision warning. The occurrence and target of such attack events can be very dynamic, making real-time and near-real-time detection challenging. Change point models, can be used for real-time anomaly detection caused by the false information attack. In this paper, we have evaluated three change point-based statistical models; Expectation Maximization, Cumulative Summation, and Bayesian Online Change Point Algorithms for cyberattack detection in the CV data. Also, data-driven artificial intelligence (AI) models, which can be used to detect known and unknown underlying patterns in the dataset, have the potential of detecting a real-time anomaly in the CV data. We have used six AI models to detect false information attacks and compared the performance for detecting the attacks with our developed change point models. Our study shows that change points models performed better in real-time false information attack detection compared to the performance of the AI models. Change point models having the advantage of no training requirements can be a feasible and computationally efficient alternative to AI models for false information attack detection in connected vehicles.
SONG: Self-Organizing Neural Graphs
Struski, Łukasz, Danel, Tomasz, Śmieja, Marek, Tabor, Jacek, Zieliński, Bartosz
Recent years have seen a surge in research on deep interpretable neural networks with decision trees as one of the most commonly incorporated tools. There are at least three advantages of using decision trees over logistic regression classification models: they are easy to interpret since they are based on binary decisions, they can make decisions faster, and they provide a hierarchy of classes. However, one of the well-known drawbacks of decision trees, as compared to decision graphs, is that decision trees cannot reuse the decision nodes. Nevertheless, decision graphs were not commonly used in deep learning due to the lack of efficient gradient-based training techniques. In this paper, we fill this gap and provide a general paradigm based on Markov processes, which allows for efficient training of the special type of decision graphs, which we call Self-Organizing Neural Graphs (SONG). We provide an extensive theoretical study of SONG, complemented by experiments conducted on Letter, Connect4, MNIST, CIFAR, and TinyImageNet datasets, showing that our method performs on par or better than existing decision models.