Goto

Collaborating Authors

 Decision Tree Learning


All About Decision Tree

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. The decision tree is one of the most powerful and important algorithms present in supervised machine learning.


Dynamic Model Tree for Interpretable Data Stream Learning

arXiv.org Artificial Intelligence

Data streams are ubiquitous in modern business and society. In practice, data streams may evolve over time and cannot be stored indefinitely. Effective and transparent machine learning on data streams is thus often challenging. Hoeffding Trees have emerged as a state-of-the art for online predictive modelling. They are easy to train and provide meaningful convergence guarantees under a stationary process. Yet, at the same time, Hoeffding Trees often require heuristic and costly extensions to adjust to distributional change, which may considerably impair their interpretability. In this work, we revisit Model Trees for machine learning in evolving data streams. Model Trees are able to maintain more flexible and locally robust representations of the active data concept, making them a natural fit for data stream applications. Our novel framework, called Dynamic Model Tree, satisfies desirable consistency and minimality properties. In experiments with synthetic and real-world tabular streaming data sets, we show that the proposed framework can drastically reduce the number of splits required by existing incremental decision trees. At the same time, our framework often outperforms state-of-the-art models in terms of predictive quality -- especially when concept drift is involved. Dynamic Model Trees are thus a powerful online learning framework that contributes to more lightweight and interpretable machine learning in data streams.


Alterations in common marmoset gut microbiome associated with duodenal strictures - Scientific Reports

#artificialintelligence

Chronic gastrointestinal (GI) diseases are the most common diseases in captive common marmosets (Callithrix jacchus). Despite standardized housing, diet and husbandry, a recently described gastrointestinal syndrome characterized by duodenal ulcers and strictures was observed in a subset of marmosets sourced from the New England Primate Research Center. As changes in the gut microbiome have been associated with GI diseases, the gut microbiome of 52 healthy, non-stricture marmosets (153 samples) were compared to the gut microbiome of 21 captive marmosets diagnosed with a duodenal ulcer/stricture (57 samples). No significant changes were observed using alpha diversity metrics, and while the community structure was significantly different when comparing beta diversity between healthy and stricture cases, the results were inconclusive due to differences observed in the dispersion of both datasets. Differences in the abundance of individual taxa using ANCOM, as stricture-associated dysbiosis was characterized by Anaerobiospirillum loss and Clostridium perfringens increases. To identify microbial and serum biomarkers that could help classify stricture cases, we developed models using machine learning algorithms (random forest, classification and regression trees, support vector machines and k-nearest neighbors) to classify microbiome, serum chemistry or complete blood count (CBC) data. Random forest (RF) models were the most accurate models and correctly classified strictures using either 9 ASVs (amplicon sequence variants), 4 serum chemistry tests or 6 CBC tests. Based on the RF model and ANCOM results, C. perfringens was identified as a potential causative agent associated with the development of strictures. Clostridium perfringens was also isolated by microbiological culture in 4 of 9 duodenum samples from marmosets with histologically confirmed strictures. Due to the enrichment of C. perfringens in situ, we analyzed frozen duodenal tissues using both 16S microbiome profiling and RNAseq. Microbiome analysis of the duodenal tissues of 29 marmosets from the MIT colony confirmed an increased abundance of Clostridium in stricture cases. Comparison of the duodenal gene expression from stricture and non-stricture marmosets found enrichment of genes associated with intestinal absorption, and lipid metabolism, localization, and transport in stricture cases. Using machine learning, we identified increased abundance of C. perfringens, as a potential causative agent of GI disease and intestinal strictures in marmosets.


How to speed up machine learning operations with Jax?

#artificialintelligence

The machine learning algorithms require a lot of mathematical operations and as the performance of the model improves, its mathematical operations also increase with complexity. A simple example of this can be the random forest and decision tree where the random forest is more accurate in maximum cases but has complex mathematics and takes more time than the decision trees. Robust modelling requires a process where large mathematical or numerical operations can be completed robustly. Jax is a library that can help us in improving the speed of mathematical operations. In this article, we will discuss the Jax library in detail.


GAM(L)A: An econometric model for interpretable Machine Learning

arXiv.org Machine Learning

Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting.


Evaluating Local Model-Agnostic Explanations of Learning to Rank Models with Decision Paths

arXiv.org Machine Learning

Local explanations of learning-to-rank (LTR) models are thought to extract the most important features that contribute to the ranking predicted by the LTR model for a single data point. Evaluating the accuracy of such explanations is challenging since the ground truth feature importance scores are not available for most modern LTR models. In this work, we propose a systematic evaluation technique for explanations of LTR models. Instead of using black-box models, such as neural networks, we propose to focus on tree-based LTR models, from which we can extract the ground truth feature importance scores using decision paths. Once extracted, we can directly compare the ground truth feature importance scores to the feature importance scores generated with explanation techniques. We compare two recently proposed explanation techniques for LTR models when using decision trees and gradient boosting models on the MQ2008 dataset. We show that the explanation accuracy in these techniques can largely vary depending on the explained model and even which data point is explained.


How to Interpret Machine Learning Models with Python -- Part 1 (easy)

#artificialintelligence

In this article, I will try to interpret the Linear Regression, Lasso, and Decision Tree models which are inherently interpretable. I will analyze global interpretability -- which analyzes the most important feature for prediction in general and local interpretability -- which explains individual prediction results. Machine learning models are used in applications such as fraud and risk detection in bank transactions, voice assistants, recommendation systems, chatbots, self-driving cars, social network analysis, etc. However, sometimes it is difficult to interpret them because the algorithm represents a black box(e.g. So we need additional techniques to analyze black box decisions.


The Yield Curve as a Recession Leading Indicator. An Application for Gradient Boosting and Random Forest

arXiv.org Machine Learning

Most representative decision tree ensemble methods have been used to examine the variable importance of Treasury term spreads to predict US economic recessions with a balance of generating rules for US economic recession detection. A strategy is proposed for training the classifiers with Treasury term spreads data and the results are compared in order to select the best model for interpretability. We also discuss the use of SHapley Additive exPlanations (SHAP) framework to understand US recession forecasts by analyzing feature importance. Consistently with the existing literature we find the most relevant Treasury term spreads for predicting US economic recession and a methodology for detecting relevant rules for economic recession detection. In this case, the most relevant term spread found is 3 month to 6 month, which is proposed to be monitored by economic authorities. Finally, the methodology detected rules with high lift on predicting economic recession that can be used by these entities for this propose. This latter result stands in contrast to a growing body of literature demonstrating that machine learning methods are useful for interpretation comparing many alternative algorithms and we discuss the interpretation for our result and propose further research lines aligned with this work.


Getting Started with Decision Trees

#artificialintelligence

Decision Tree algorithm is one of the most powerful algorithms in machine learning and data science. Decision Tree algorithm is one of the most powerful algorithms in machine learning and data science. It is very commonly used by data scientists and machine learning engineers to solve business problem and explain that to your customers easily.


15 Best Artificial Intelligence Books [Beginners, Pros & Business Leaders]

#artificialintelligence

Being well versed in statistics is absolutely critical for data scientists, and this book is particularly helpful for that. It grants you a comprehensive explanation of the most vital concepts in statistical learning, without needing to be a mathematics expert to understand the concepts. It even goes through supervised and unsupervised learning, including SVMs, neural networks and decision trees, random forests, LASSO regression, etc.