Decision Tree Learning
Four interpretable algorithms that you should use in 2022
The new year has begun, and it is the time for good resolutions. One of them could be to make decision-making processes more interpretable. To help you do this, I present four interpretable rule-based algorithms. These four algorithms share the use of ensemble of decision trees as rule generator (like Random Forest, AdaBoost, Gradient Boosting, etc.). In other words, each of these interpretable algorithms starts its process by fitting a black box model and generating an interpretable rule ensemble model.
Phishing Websites Classification
The figure above demonstrates the unbalancing in the data between phishing and not phishing classes extracted from the dataset. The table above demonstrates scores of the most important metrics in classification. We can notice that The Random Forest got the highest score. Receiver Operating Characteristic curve aka ROC curve is a common method for evaluating the equality of a binary classifier, it compares the presence of true positives and false-positive at every probability threshold. According to the above figure, Random Forest and decision tree predicted the most observations correctly compared with the other classifiers.
Yoga-Pose-Estimator
An ML model that classifies yoga pose into 4 most famous asanas namely downward dog, plank pose, tree pose, goddess pose, and warrior-2 pose using Mediapipe Blazepose for feature extraction. Images are first resized to reduce computation. Gamma correction is a non-linear adjustment to individual pixel values. In image normalization, linear operations are carried out on individual pixels, gamma correction carries out a non-linear operation on the source image pixels, and can cause saturation of the image being altered. Train and test machine learning algorithms (Random Forest, SVC, Decision Tree, KNN, Adaboost, RFC) using the dataframe (csv) generated to find which model best fits.
EiFFFeL: Enforcing Fairness in Forests by Flipping Leaves
Abebe, Seyum Assefa, Lucchese, Claudio, Orlando, Salvatore
Nowadays Machine Learning (ML) techniques are extensively adopted in many socially sensitive systems, thus requiring to carefully study the fairness of the decisions taken by such systems. Many approaches have been proposed to address and to make sure there is no bias against individuals or specific groups which might originally come from biased training datasets or algorithm design. In this regard, we propose a fairness enforcing approach called EiFFFeL:Enforcing Fairness in Forests by Flipping Leaves which exploits tree-based or leaf-based post-processing strategies to relabel leaves of selected decision trees of a given forest. Experimental results show that our approach achieves a user defined group fairness degree without losing a significant amount of accuracy.
[100%OFF] Machine Learning & Deep Learning in Python & R
Learn how to solve real life problem using the Machine learning techniques Machine Learning models such as Linear Regression, Logistic Regression, KNN etc. Advanced Machine Learning models such as Decision trees, XGBoost, Random Forest, SVM etc. Understanding of basics of statistics and concepts of Machine Learning How to do basic statistical operations and run ML models in Python Indepth knowledge of data collection and data preprocessing for Machine Learning problem How to convert business problem into a Machine learning problem Can I get a certificate after completing the course? Are there any other coupons available for this course? Note: 100% OFF Udemy coupon codes are valid for maximum 3 days only. Look for "ENROLL NOW" button at the end of the post. Disclosure: This post may contain affiliate links and we may get small commission if you make a purchase.
DANets: Deep Abstract Networks for Tabular Data Classification and Regression
Chen, Jintai, Liao, Kuanlun, Wan, Yao, Chen, Danny Z., Wu, Jian
Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e.g., convolution) and extensible neural networks (e.g., ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. Also, we design a structure re-parameterization method to compress AbstLay, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks. In DANets, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our AbstLay and DANets are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANet as it goes deep, verifying the extendibility of our method. Our code is available at https://github.com/WhatAShot/DANet.
AI in Software Engineering -- Present and Future
AI (Artificial Intelligence) as we know it, is the reason behind all the advancements that we see in today's world, on the technology front (of course!). Soon, we will see machines or robots taking over most of the humane work. From healthcare to insurance, banking to finance, eCommerce to Edtech and Fintech, we can see the footprints and lasting impressions of AI in every industry domain and Software/IT is no exception. While we talk about software engineering, software development and all related aspects of SDLC (Software Development Lifecycle) come under it. From analyzing the requirements to designing, developing, deploying, and testing, software engineering vastly covers all these areas and more.
House Price Prediction using a Random Forest Classifier
In this blog post, I will use machine learning and Python for predicting house prices. I will use a Random Forest Classifier (in fact Random Forest regression). In the end, I will demonstrate my Random Forest Python algorithm! There is no law except the law that there is no law. Data Science is about discovering hidden patterns (laws) in your data.
Explanation of Machine Learning Models Using Shapley Additive Explanation and Application for Real Data in Hospital
Nohara, Yasunobu, Matsumoto, Koutarou, Soejima, Hidehisa, Nakashima, Naoki
When using machine learning techniques in decision-making processes, the interpretability of the models is important. In the present paper, we adopted the Shapley additive explanation (SHAP), which is based on fair profit allocation among many stakeholders depending on their contribution, for interpreting a gradient-boosting decision tree model using hospital data. For better interpretability, we propose two novel techniques as follows: (1) a new metric of feature importance using SHAP and (2) a technique termed feature packing, which packs multiple similar features into one grouped feature to allow an easier understanding of the model without reconstruction of the model. We then compared the explanation results between the SHAP framework and existing methods. In addition, we showed how the A/G ratio works as an important prognostic factor for cerebral infarction using our hospital data and proposed techniques.
Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies
Lai, Vivian, Chen, Chacha, Liao, Q. Vera, Smith-Renner, Alison, Tan, Chenhao
As AI systems demonstrate increasingly strong predictive performance, their adoption has grown in numerous domains. However, in high-stakes domains such as criminal justice and healthcare, full automation is often not desirable due to safety, ethical, and legal concerns, yet fully manual approaches can be inaccurate and time consuming. As a result, there is growing interest in the research community to augment human decision making with AI assistance. Besides developing AI technologies for this purpose, the emerging field of human-AI decision making must embrace empirical approaches to form a foundational understanding of how humans interact and work with AI to make decisions. To invite and help structure research efforts towards a science of understanding and improving human-AI decision making, we survey recent literature of empirical human-subject studies on this topic. We summarize the study design choices made in over 100 papers in three important aspects: (1) decision tasks, (2) AI models and AI assistance elements, and (3) evaluation metrics. For each aspect, we summarize current trends, discuss gaps in current practices of the field, and make a list of recommendations for future research. Our survey highlights the need to develop common frameworks to account for the design and research spaces of human-AI decision making, so that researchers can make rigorous choices in study design, and the research community can build on each other's work and produce generalizable scientific knowledge. We also hope this survey will serve as a bridge for HCI and AI communities to work together to mutually shape the empirical science and computational technologies for human-AI decision making.