Goto

Collaborating Authors

 Decision Tree Learning


The Fourth Industrial Revolution: How Big Data and Machine Learning Can Boost Inclusive Fintech - NextBillion

#artificialintelligence

The lending and credit scoring sector have more data than ever before at their disposal. How they leverage this data to create value for their clients and social impact determines the outcomes they can achieve in the financial services space. In 1959, Arthur Samuel, a pioneer in the field of machine learning (ML) and artificial intelligence during an era when computers filled an entire building, defined machine learning as "a field of study that gives computers the ability to learn without being explicitly programmed." During a recent keynote, Microsoft CEO Satya Nadella referred to data used in this context as "the new electricity," calling our current era a "fourth industrial revolution" following steam, electricity and digital technology. Scott Guthrie, Microsoft executive vice president, also acknowledged that data is "enabling every business to be the disrupters of their industry by harnessing the power to drive insight from this data."


ลทhat Random Forests in Python

#artificialintelligence

Random forest is a highly versatile machine learning method with numerous applications ranging from marketing to healthcare and insurance. It can be used to model the impact of marketing on customer acquisition, retention, and churn or to predict disease risk and susceptibility in patients. Random forest is capable of regression and classification. It can handle a large number of features, and it's helpful for estimating which of your variables are important in the underlying data being modeled. This is a post about random forests using Python.


Top 10 Machine Learning Algorithms for Beginners

#artificialintelligence

The study of ML algorithms has gained immense traction post the Harvard Business Review article terming a'Data Scientist' as the'Sexiest job of the 21st century'. So, for those starting out in the field of ML, we decided to do a reboot of our immensely popular Gold blog The 10 Algorithms Machine Learning Engineers need to know - albeit this post is targetted towards beginners. ML algorithms are those that can learn from data and improve from experience, without human intervention. Learning tasks may include learning the function that maps the input to the output, learning the hidden structure in unlabeled data; or'instance-based learning', where a class label is produced for a new instance by comparing the new instance (row) to instances from the training data, which were stored in memory. 'Instance-based learning' does not create an abstraction from specific instances. Supervised learning can be explained as follows: use labeled training data to learn the mapping function from the input variables (X) to the output variable (Y).


Random Forests explained intuitively

#artificialintelligence

Say, you appeared for the position of Statistical analyst at WalmartLabs. Now like most of the companies, you don't just have one round of interview. You have multiple rounds of interviews. Each one of these interviews is chaired by independent panels. Generally, even the questions asked in these interviews differ from each other.


Consistent feature attribution for tree ensembles

arXiv.org Machine Learning

Note that a newer expanded version of this paper is now available at: arXiv:1802.03888 It is critical in many applications to understand what features are important for a model, and why individual predictions were made. For tree ensemble methods these questions are usually answered by attributing importance values to input features, either globally or for a single prediction. Here we show that current feature attribution methods are inconsistent, which means changing the model to rely more on a given feature can actually decrease the importance assigned to that feature. To address this problem we develop fast exact solutions for SHAP (SHapley Additive exPlanation) values, which were recently shown to be the unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate. We integrate these improvements into the latest version of XGBoost, demonstrate the inconsistencies of current methods, and show how using SHAP values results in significantly improved supervised clustering performance. Feature importance values are a key part of understanding widely used models such as gradient boosting trees and random forests, so improvements to them have broad practical implications.


Tree Ensembles with Rule Structured Horseshoe Regularization

arXiv.org Machine Learning

We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu (2008) where rules from decision trees and linear terms are used in a L1-regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictor while leaving the important signal essentially untouched. This is especially important when a large number of rules are used as predictors as many of them only contribute noise. Our horseshoe prior has an additional hierarchical layer that applies more shrinkage a priori to rules with a large number of splits, and to rules that are only satisfied by a few observations. The aggressive noise shrinkage of our prior also makes it possible to complement the rules from boosting in Friedman and Popescu (2008) with an additional set of trees from random forest, which brings a desirable diversity to the ensemble. We sample from the posterior distribution using a very efficient and easily implemented Gibbs sampler. The new model is shown to outperform state-of-the-art methods like RuleFit, BART and random forest on 16 datasets. The model and its interpretation is demonstrated on the well known Boston housing data, and on gene expression data for cancer classification. The posterior sampling, prediction and graphical tools for interpreting the model results are implemented in a publicly available R package.


Ensemble Machine Learning in Python: Random Forest, AdaBoost

@machinelearnbot

In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.


A beginner's guide to artificial intelligence, machine learning, and cognitive computing

#artificialintelligence

For millennia, humans have pondered the idea of building intelligent machines. Ever since, artificial intelligence (AI) has had highs and lows, demonstrated successes and unfulfilled potential. Today, the news is filled with the application of machine learning algorithms to new problems. From cancer detection and prediction to image understanding and summarization and natural language processing, AI is empowering people and changing our world. The history of modern AI has all the elements of a great drama. Beginning in the 1950s with a focus on thinking machines and interesting characters like Alan Turing and John von Neumann, AI began its first rise. Decades of booms and busts and impossibly high expectations followed, but AI and its pioneers pushed forward.


Introduction to Python Ensembles

@machinelearnbot

Ensembles have rapidly become one of the hottest and most popular methods in applied machine learning. Virtually every winning Kaggle solution features them, and many data science pipelines have ensembles in them. Put simply, ensembles combine predictions from different models to generate a final prediction, and the more models we include the better it performs. Better still, because ensembles combine baseline predictions, they perform at least as well as the best baseline model. Ensembles give us a performance boost almost for free!


Crit\`eres de qualit\'e d'un classifieur g\'en\'eraliste

arXiv.org Machine Learning

This paper considers the problem of choosing a good classifier. For each problem there exist an optimal classifier, but none are optimal, regarding the error rate, in all cases. Because there exists a large number of classifiers, a user would rather prefer an all-purpose classifier that is easy to adjust, in the hope that it will do almost as good as the optimal. In this paper we establish a list of criteria that a good generalist classifier should satisfy . We first discuss data analytic, these criteria are presented. Six among the most popular classifiers are selected and scored according to these criteria. Tables allow to easily appreciate the relative values of each. In the end, random forests turn out to be the best classifiers.