Goto

Collaborating Authors

 Decision Tree Learning


Generalized Bayesian Additive Regression Trees Models: Beyond Conditional Conjugacy

arXiv.org Machine Learning

Bayesian additive regression trees have seen increased interest in recent years due to their ability to combine machine learning techniques with principled uncertainty quantification. The Bayesian backfitting algorithm used to fit BART models, however, limits their application to a small class of models for which conditional conjugacy exists. In this article, we greatly expand the domain of applicability of BART to arbitrary \emph{generalized BART} models by introducing a very simple, tuning-parameter-free, reversible jump Markov chain Monte Carlo algorithm. Our algorithm requires only that the user be able to compute the likelihood and (optionally) its gradient and Fisher information. The potential applications are very broad; we consider examples in survival analysis, structured heteroskedastic regression, and gamma shape regression.


The Most Popular Machine Learning Algorithms Explained

#artificialintelligence

We've already talked about artificial intelligence and how much it's been evolving. Behind all those developments, there's a lot of programming & data stuff. For some, it may sound boring but the fun part is -- the most popular algorithms are quite simple to understand. Complex systems unusually have basic rules beneath to support them. So we will grab four from the top.


'Simple' AI Can Anticipate Bank Managers' Loan Decisions to Over 95% Accuracy

#artificialintelligence

A new research project has found that the discretionary decisions made by human bank managers can be replicated by machine learning systems to an accuracy of more than 95%. Using the same data available to bank managers in a privileged dataset, the best-performing algorithm in the test was a Random Forest implementation โ€“ a fairly simple approach that's twenty years old, but which still outperformed a neural network when attempting to mimic the behavior of human bank managers formulating final decisions about loans. The Random Forest algorithm, one of four put through their paces for the project, achieves high human-equivalent scoring vs. performance of bank managers, despite the relative simplicity of the algorithm. The researchers, who had access to a proprietary dataset of 37,449 loan ratings across 4,414 unique customers at'a large commercial bank', suggest at various points in the preprint paper that the automated data analysis that managers are given to make their decision has now become so accurate that bank managers rarely deviate from it, potentially signifying that bank managers' part in the loan approval process chiefly consists of retaining someone to fire in the event of a loan default. 'From a practical perspective it is worth noting that our results may indicate that the bank could process loans faster and cheaper in the absence of human loan managers with very comparable results.


Fast Interpretable Greedy-Tree Sums (FIGS)

arXiv.org Machine Learning

Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in many problems. Here, we propose Fast Interpretable Greedy-Tree Sums (FIGS), an algorithm for fitting concise rule-based models. Specifically, FIGS generalizes the CART algorithm to simultaneously grow a flexible number of trees in a summation. The total number of splits across all the trees can be restricted by a pre-specified threshold, thereby keeping both the size and number of its trees under control. When both are small, the fitted tree-sum can be easily visualized and written out by hand, making it highly interpretable. A partially oracle theoretical result hints at the potential for FIGS to overcome a key weakness of single-tree models by disentangling additive components of generative additive models, thereby reducing redundancy from repeated splits on the same feature. Furthermore, given oracle access to optimal tree structures, we obtain L2 generalization bounds for such generative models in the case of C1 component functions, matching known minimax rates in some cases. Extensive experiments across a wide array of real-world datasets show that FIGS achieves state-of-the-art prediction performance (among all popular rule-based methods) when restricted to just a few splits (e.g. less than 20). We find empirically that FIGS is able to avoid repeated splits, and often provides more concise decision rules than fitted decision trees, without sacrificing predictive performance. All code and models are released in a full-fledged package on Github \url{https://github.com/csinva/imodels}.


Top Posts Feb 7-13: Decision Tree Algorithm, Explained - KDnuggets

#artificialintelligence

Also: How to Learn Math for Machine Learning; 7 Steps to Mastering Machine Learning with Python in 2022; Top Programming Languages and Their Uses; The Complete Collection of Data Science Cheat Sheets โ€“ Part 1


Random Forests Weighted Local Fr\'echet Regression with Theoretical Guarantee

arXiv.org Machine Learning

Statistical analysis is increasingly confronted with complex data from general metric spaces, such as symmetric positive definite matrix-valued data and probability distribution functions. [47] and [17] establish a general paradigm of Fr\'echet regression with complex metric space valued responses and Euclidean predictors. However, their proposed local Fr\'echet regression approach involves nonparametric kernel smoothing and suffers from the curse of dimensionality. To address this issue, we in this paper propose a novel random forests weighted local Fr\'echet regression paradigm. The main mechanism of our approach relies on the adaptive kernels generated by random forests. Our first method utilizes these weights as the local average to solve the Fr\'echet mean, while the second method performs local linear Fr\'echet regression, making both methods locally adaptive. Our proposals significantly improve existing Fr\'echet regression methods. Based on the theory of infinite order U-processes and infinite order Mmn-estimator, we establish the consistency, rate of convergence, and asymptotic normality for our proposed random forests weighted Fr\'echet regression estimator, which covers the current large sample theory of random forests with Euclidean responses as a special case. Numerical studies show the superiority of our proposed two methods for Fr\'echet regression with several commonly encountered types of responses such as probability distribution functions, symmetric positive definite matrices, and sphere data. The practical merits of our proposals are also demonstrated through the application to the human mortality distribution data.


AI based home monitoring during COVID-19 pandemic

#artificialintelligence

In a recent study posted to the medRxiv* preprint server, an interdisciplinary team of researchers conducted an open, prospective pilot feasibility analysis through artificial intelligence (AI)-based platform to provide clinical decision support on coronavirus disease 2019 (COVID-19) outcomes. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-related symptoms and disease course pose an enormous burden on the healthcare facilities. During the COVID-19 pandemic, e-telemonitoring was recommended to reduce the pressure on the over-whelming healthcare systems and limit access to the emergency department (ED). Healthcare big data analysis use is increasing, as represented by the explosion of the internet of medical things (IoMT). This study was designed to estimate the application and integration of a dedicated AI-based support system with the territory and hospital intervention plan during the COVID-19 pandemic.


Xu

AAAI Conferences

The weighted constraint satisfaction problem (WCSP) is a powerful mathematical framework for combinatorial optimization. The branch and bound search paradigm is very successful in solving the WCSP but critically depends on the ordering in which variables are instantiated. In this paper, we introduce a new framework for dynamic variable ordering for solving the WCSP. This framework is inspired by regression decision tree learning. Variables are ordered dynamically based on samples of random assignments of values to variables as well as their corresponding total weights.


Sridharan

AAAI Conferences

This paper describes an architecture for an agent to learn and reason about affordances. In this architecture, Answer Set Prolog, a declarative language, is used to represent and reason with incomplete domain knowledge that includes a representation of affordances as relations defined jointly over objects and actions. Reinforcement learning and decision-tree induction based on this relational representation and observations of action outcomes are used to interactively and cumulatively (a) acquire knowledge of affordances of specific objects being operated upon by specific agents; and (b) generalize from these specific learned instances. The capabilities of this architecture are illustrated and evaluated in two simulated domains, a variant of the classic Blocks World domain, and a robot assisting humans in an office environment.


How to Implement and Evaluate Decision Tree classifiers from scikit-learn

#artificialintelligence

A Decision Tree follows a tree-like structure (hence the name) whereby a node represents a specific attribute, a branch represents a decision rule, and leaf nodes represent an outcome. We will show this structure later so you can see what we mean but you can imagine it is like one of the decision trees you used to draw in high school maths, just on a far more complicated scale. The algorithm itself works by splitting the data according to different attributes at each node while attempting to reduce a selection measure (often the Gini index). In essence, the aim of a Decision Tree classifier is to split the data according to attributes while being able to classify the data accurately into predefined groups (our target variable). For this decision tree implementation we will use the iris dataset from sklearn which is relatively simple to understand and is easy to implement.