Goto

Collaborating Authors

 Decision Tree Learning


Derisking machine learning and artificial intelligence

#artificialintelligence

The added risk brought on by the complexity of machine-learning models can be mitigated by making well-targeted modifications to existing validation frameworks. Machine learning and artificial intelligence are set to transform the banking industry, using vast amounts of data to build models that improve decision making, tailor services, and improve risk management. According to the McKinsey Global Institute, this could generate value of more than $250 billion in the banking industry.1 1.For the purposes of this article machine learning is broadly defined to include algorithms that learn from data without being explicitly programmed, including, for example, random forests, boosted decision trees, support-vector machines, deep learning, and reinforcement learning. The definition includes both supervised and unsupervised algorithms. For a full primer on the applications of artificial intelligence, we refer the reader to "An executive's guide to AI."


ML: Hukou System and Health Outcomes

#artificialintelligence

University of Johannesburg and CIRANO; 4 / 46 AEA 2019 - Atlanta SKEMA Introduction Introduction China's rapid development have spurred large migration from rural areas to urban areas Between 1990 and the end of 2015 the proportion of China's population living in urban areas jumped from 26% to 56% Currently estimated by census, there are more than 240 million rural-to-urban migrants and more than 160 million working in cities outside of their hukou.


Artificial Intelligence Decision Tree

#artificialintelligence

In this article we will discuss decision points for selecting right components for Artificial Intelligence (AI) solutions. This is also an update to Machine Learning Decision Tree (v1). Keep in mind here that AI is a broader term compared to Machine Learning.


On the consistency of supervised learning with missing values

arXiv.org Machine Learning

In many application settings, the data are plagued with missing features. These hinder data analysis. An abundant literature addresses missing values in an inferential framework, where the aim is to estimate parameters and their variance from incomplete tables. Here, we consider supervised-learning settings where the objective is to best predict a target when missing values appear in both training and test sets. We analyze which missing-values strategies lead to good prediction. We show the consistency of two approaches to estimating the prediction function. The most striking one shows that the widely-used mean imputation prior to learning method is consistent when missing values are not informative. This is in contrast with inferential settings as mean imputation is known to have serious drawbacks in terms of deformation of the joint and marginal distribution of the data. That such a simple approach can be consistent has important consequences in practice. This result holds asymptotically when the learning algorithm is consistent in itself. We contribute additional analysis on decision trees as they can naturally tackle empirical risk minimization with missing values. This is due to their ability to handle the half-discrete nature of variables with missing values. After comparing theoretically and empirically different missing-values strategies in trees, we recommend using the missing incorporated in attributes method as it can handle both non-informative and informative missing values.


Classification and Regression Trees

#artificialintelligence

Learn about CART in this guest post by Jillur Quddus, a lead technical architect, polyglot software engineer and data scientist with over 10 years of hands-on experience in architecting and engineering distributed, scalable, high-performance, and secure solutions used to combat serious organized crime, cybercrime, and fraud. Although both linear regression models allow and logistic regression models allow us to predict a categorical outcome, both of these models assume a linear relationship between variables. Classification and Regression Trees (CART) overcome this problem by generating Decision Trees. These decision trees can then be traversed to come to a final decision, where the outcome can either be numerical (regression trees) or categorical (classification trees). When traversing decision trees, start at the top. Thereafter, traverse left for yes, or positive responses, and traverse right for no, or negative responses.


Silencing Malware with AI

#artificialintelligence

Stuart McClure is on a personal mission. After more than two decades in the anti-malware industry, he firmly believes that ninety percent of malware attacks today can be prevented by not clicking on this, not clicking on that, and not opening that attachment either. While he's not the first nor alone in suggesting the user bears at least some responsibility, the anti-malware industry up until now hasn't yet produced an effective alternative to signature-based solutions based on known attacks. McClure's company, Cylance, thinks it has the answer with its first-generation AI-driven anti-malware products for both enterprises and consumers. "Why couldn't we simply train a computer to think like a cybersecurity professional to know what to do and not to do based on the characteristics and features of known attacks?" asked McClure.


Improve Machine Learning Results with Ensemble Learning

#artificialintelligence

NOTE: This article assumes that you are familiar with a basic understanding of Machine Learning algorithms. Suppose you want to buy a new mobile phone, will you walk directly to the first shop and purchase the mobile based on the advice of shopkeeper? You would visit some of the online mobile seller sites where you can see a variety of mobile phones, their specifications, features, and prices. You may also consider the reviews that people posted on the site. However, you probably might also ask your friends and colleagues for their opinions.


Automated ASPECTS on Noncontrast CT Scans in Patients with Acute Ischemic Stroke Using Machine Learning

#artificialintelligence

BACKGROUND AND PURPOSE: Alberta Stroke Program Early CT Score (ASPECTS) was devised as a systematic method to assess the extent of early ischemic change on noncontrast CT (NCCT) in patients with acute ischemic stroke (AIS). Our aim was to automate ASPECTS to objectively score NCCT of AIS patients. MATERIALS AND METHODS: We collected NCCT images with a 5-mm thickness of 257 patients with acute ischemic stroke ( 8 hours from onset to scans) followed by a diffusion-weighted imaging acquisition within 1 hour. Expert ASPECTS readings on DWI were used as ground truth. Texture features were extracted from each ASPECTS region of the 157 training patient images to train a random forest classifier. The unseen 100 testing patient images were used to evaluate the performance of the trained classifier.


Seeds of Machine Learning - SageORB

#artificialintelligence

Machine learning is one of the most powerful forces in technology. Its development is shaping the forefront of the future in industries in artificial intelligence. Machine learning refers to the automated process by which machines extract meaningful patterns in data. Without machine learning, artificial intelligence as we know it wouldn't be possible. In 1959, MIT engineer Arthur Samuel coined the term "machine learning" and described machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed."


A Machine Learning based Robust Prediction Model for Real-life Mobile Phone Data

arXiv.org Machine Learning

Real-life mobile phone data may contain noisy instances, which is a fundamental issue for building a prediction model with many potential negative consequences. The complexity of the inferred model may increase, may arise overfitting problem, and thereby the overall prediction accuracy of the model may decrease. In this paper, we address these issues and present a robust prediction model for real-life mobile phone data of individual users, in order to improve the prediction accuracy of the model. In our robust model, we first effectively identify and eliminate the noisy instances from the training dataset by determining a dynamic noise threshold using naive Bayes classifier and laplace estimator, which may differ from user-to-user according to their unique behavioral patterns. After that, we employ the most popular rule-based machine learning classification technique, i.e., decision tree, on the noise-free quality dataset to build the prediction model. Experimental results on the real-life mobile phone datasets (e.g., phone call log) of individual mobile phone users, show the effectiveness of our robust model in terms of precision, recall and f-measure.