Goto

Collaborating Authors

 Decision Tree Learning


Discriminatory AI explained with an example

#artificialintelligence

AI is increasingly used in making decisions that impact us directly such as job applications, our credit rating, match-making on dating sites. So it is important that AI is non-discriminatory and that decisions do not favor certain races, gender, the color of skin. Discriminatory AI is a very wide subject going beyond purely technical aspects. However, to make it easily understandable, I will demonstrate how discriminatory AI looks using examples and visuals. This will give you a way to spot a discriminatory AI. Let me first establish the context of the example.


An Efficient Approach for Optimizing the Cost-effective Individualized Treatment Rule Using Conditional Random Forest

arXiv.org Machine Learning

Evidence from observational studies has become increasingly important for supporting healthcare policy making via cost-effectiveness (CE) analyses. Similar as in comparative effectiveness studies, health economic evaluations that consider subject-level heterogeneity produce individualized treatment rules (ITRs) that are often more cost-effective than one-size-fits-all treatment. Thus, it is of great interest to develop statistical tools for learning such a cost-effective ITR (CE-ITR) under the causal inference framework that allows proper handling of potential confounding and can be applied to both trials and observational studies. In this paper, we use the concept of net-monetary-benefit (NMB) to assess the trade-off between health benefits and related costs. We estimate CE-ITR as a function of patients' characteristics that, when implemented, optimizes the allocation of limited healthcare resources by maximizing health gains while minimizing treatment-related costs. We employ the conditional random forest approach and identify the optimal CE-ITR using NMB-based classification algorithms, where two partitioned estimators are proposed for the subject-specific weights to effectively incorporate information from censored individuals. We conduct simulation studies to evaluate the performance of our proposals. We apply our top-performing algorithm to the NIH-funded Systolic Blood Pressure Intervention Trial (SPRINT) to illustrate the CE gains of assigning customized intensive blood pressure therapy.


Understanding your Neural Network's predictions

#artificialintelligence

Neural networks are extremely convenient. They are usable for both regression and classification, work on structured and unstructured data, handle temporal data very well, and can usually reach high performances if they are given a sufficient amount of data. What is gained in convenience is, however, lost in interpretability and that can be a major setback when models are presented to a non-technical audience, such as clients or stakeholders. For instance, last year, the Data Science team I am part of wanted to convince a client to go from a decision tree model to a neural network, and for good reasons: we had access to a large amount of data and most of it was temporal. The client was on board, but wanted to keep an understanding of what the model based its decisions on, which means evaluating its features' importance.


The Application of Machine Learning Techniques for Predicting Match Results in Team Sport: A Review

Journal of Artificial Intelligence Research

Predicting the results of matches in sport is a challenging and interesting task. In this paper, we review a selection of studies from 1996 to 2019 that used machine learning for predicting match results in team sport. Considering both invasion sports and striking/fielding sports, we discuss commonly applied machine learning algorithms, as well as common approaches related to data and evaluation. Our study considers accuracies that have been achieved across different sports, and explores whether evidence exists to support the notion that outcomes of some sports may be inherently more difficult to predict. We also uncover common themes of future research directions and propose recommendations for future researchers. Although there remains a lack of benchmark datasets (apart from in soccer), and the differences between sports, datasets and features makes between-study comparisons difficult, as we discuss, it is possible to evaluate accuracy performance in other ways. Artificial Neural Networks were commonly applied in early studies, however, our findings suggest that a range of models should instead be compared. Selecting and engineering an appropriate feature set appears to be more important than having a large number of instances. For feature selection, we see potential for greater inter-disciplinary collaboration between sport performance analysis, a sub-discipline of sport science, and machine learning.


Decision Trees vs Random Forest

#artificialintelligence

Last week I published two articles about Decision Trees: one about Decision and Classification Tree (CART) and another tutorial on how to implement Random Forest classifier. These two methods may look very similar, however there are important differences that every data professional or enthusiastic should know.


Wrote about Decision trees -- Karthikeyan A K

#artificialintelligence

Once again machine learning bug bit me, and after a long delay I wrote about Decision Trees in my book Introduction To DataScience. I am looking to write about K-nearest neighbors next. Even though I have a work where the client hasn't yet given me a unnecessary trouble yet (which has not been the case for years now, and it looks like I have entered dream land or something), work is taking time, and I have to tend to what gives me bread first. But learning Data Science from scratch is my goal, and I will achieve it. I would be very happy if people can read it and mail their feedback to [email protected], no matter where in the world I am, mail works if I have the internet, and I can take corrective measures and possibly answer you.


Data Mining with Rattle

#artificialintelligence

Rattle and R deliver a very sophisticated data mining environment. Data Mining with Rattle is a unique course that instructs with respect to both the concepts of data mining, as well as to the "hands-on" use of a popular, contemporary data mining software tool, "Data Miner," also known as the'Rattle' package in R software. Rattle is a popular GUI-based software tool which'fits on top of' R software. The course focuses on life-cycle issues, processes, and tasks related to supporting a'cradle-to-grave' data mining project. These include: data exploration and visualization; testing data for random variable family characteristics and distributional assumptions; transforming data by scale or by data type; performing cluster analyses; creating, analyzing and interpreting association rules; and creating and evaluating predictive models that may utilize: regression; generalized linear modeling (GLMs); decision trees; recursive partitioning; random forests; boosting; and/or support vector machine (SVM) paradigms. It is both a conceptual and a practical course as it teaches and instructs about data mining, and provides ample demonstrations of conducting data mining tasks using the Rattle R package.


Q-learning with online random forests

arXiv.org Machine Learning

$Q$-learning is the most fundamental model-free reinforcement learning algorithm. Deployment of $Q$-learning requires approximation of the state-action value function (also known as the $Q$-function). In this work, we provide online random forests as $Q$-function approximators and propose a novel method wherein the random forest is grown as learning proceeds (through expanding forests). We demonstrate improved performance of our methods over state-of-the-art Deep $Q$-Networks in two OpenAI gyms (`blackjack' and `inverted pendulum') but not in the `lunar lander' gym. We suspect that the resilience to overfitting enjoyed by random forests recommends our method for common tasks that do not require a strong representation of the problem domain. We show that expanding forests (in which the number of trees increases as data comes in) improve performance, suggesting that expanding forests are viable for other applications of online random forests beyond the reinforcement learning setting.


Machine Learning-Based GPS Multipath Detection Method Using Dual Antennas

arXiv.org Artificial Intelligence

In urban areas, global navigation satellite system (GNSS) signals are often reflected or blocked by buildings, thus resulting in large positioning errors. In this study, we proposed a machine learning approach for global positioning system (GPS) multipath detection that uses dual antennas. A machine learning model that could classify GPS signal reception conditions was trained with several GPS measurements selected as suggested features. We applied five features for machine learning, including a feature obtained from the dual antennas, and evaluated the classification performance of the model, after applying four machine learning algorithms: gradient boosting decision tree (GBDT), random forest, decision tree, and K-nearest neighbor (KNN). It was found that a classification accuracy of 82%-96% was achieved when the test data set was collected at the same locations as those of the training data set. However, when the test data set was collected at locations different from those of the training data, a classification accuracy of 44%-77% was obtained.


How to know when AI is the right solution

#artificialintelligence

AI adoption is on the rise. According to a recent McKinsey survey, 55% of companies use artificial intelligence in at least one function, and 27% attribute at least 5% of earnings before interest and taxes to AI, much of that in the form of cost savings. As AI will dramatically transform nearly every industry it touches, it's no surprise that vendors and enterprises are looking for opportunities to deploy AI everywhere they can. But not every project can benefit from AI and attempting to apply AI inappropriately can not only cost time and money but also sour employees, customers, and corporate leaders on future AI projects. The key factors for determining whether a project is suitable for AI are business value, availability of training data, and cultural readiness for change.