Goto

Collaborating Authors

 Ensemble Learning


A Recommendation and Risk Classification System for Connecting Rough Sleepers to Essential Outreach Services

arXiv.org Machine Learning

Rough sleeping is a chronic problem faced by some of the most disadvantaged people in modern society. This paper describes work carried out in partnership with Homeless Link, a UK-based charity, in developing a data-driven approach to assess the quality of incoming alerts from members of the public aimed at connecting people sleeping rough on the streets with outreach service providers. Alerts are prioritised based on the predicted likelihood of successfully connecting with the rough sleeper, helping to address capacity limitations and to quickly, effectively, and equitably process all of the alerts that they receive. Initial evaluation concludes that our approach increases the rate at which rough sleepers are found following a referral by at least 15\% based on labelled data, implying a greater overall increase when the alerts with unknown outcomes are considered, and suggesting the benefit in a trial taking place over a longer period to assess the models in practice. The discussion and modelling process is done with careful considerations of ethics, transparency and explainability due to the sensitive nature of the data in this context and the vulnerability of the people that are affected.


Surrogate Locally-Interpretable Models with Supervised Machine Learning Algorithms

arXiv.org Machine Learning

Supervised Machine Learning (SML) algorithms, such as Gradient Boosting, Random Forest, and Neural Networks, have become popular in recent years due to their superior predictive performance over traditional statistical methods. However, their complexity makes the results hard to interpret without additional tools. There has been a lot of recent work in developing global and local diagnostics for interpreting SML models. In this paper, we propose a locally-interpretable model that takes the fitted ML response surface, partitions the predictor space using model-based regression trees, and fits interpretable main-effects models at each of the nodes. We adapt the algorithm to be efficient in dealing with high-dimensional predictors. While the main focus is on interpretability, the resulting surrogate model also has reasonably good predictive performance.


Supervised Machine Learning Techniques: An Overview with Applications to Banking

arXiv.org Machine Learning

This article provides an overview of Supervised Machine Learning (SML) with a focus on applications to banking. The SML techniques covered include Bagging (Random Forest or RF), Boosting (Gradient Boosting Machine or GBM) and Neural Networks (NNs). We begin with an introduction to ML tasks and techniques. This is followed by a description of: i) tree-based ensemble algorithms including Bagging with RF and Boosting with GBMs, ii) Feedforward NNs, iii) a discussion of hyper-parameter optimization techniques, and iv) machine learning interpretability. The paper concludes with a comparison of the features of different ML algorithms. Examples taken from credit risk modeling in banking are used throughout the paper to illustrate the techniques and interpret the results of the algorithms.


[D] What's the best deep-dive explanation of XGBoost hyperparameters out there?

#artificialintelligence

I'm not a total newbie, so I'd thank for all those "how to get started with xgboost" articles which there are plenty of. I remember having bumped into a site or blog with a great and comprehensive summary of each hyperparameter, but I lost that link and can't find it know from search. As far as I remember, it had a hyperparameter menu on the left, probably referred to all boosting trees and their hyperparameters and was created by some women. Anybody can recall that source?


Interpretabilit\'e des mod\`eles : \'etat des lieux des m\'ethodes et application \`a l'assurance

arXiv.org Machine Learning

Since May 2018, the General Data Protection Regulation (GDPR) has introduced new obligations to industries. By setting a legal framework, it notably imposes strong transparency on the use of personal data. Thus, people must be informed of the use of their data and must consent the usage of it. Data is the raw material of many models which today make it possible to increase the quality and performance of digital services. Transparency on the use of data also requires a good understanding of its use through different models. The use of models, even if efficient, must be accompanied by an understanding at all levels of the process that transform data (upstream and downstream of a model), thus making it possible to define the relationships between the individual's data and the choice that an algorithm could make based on the analysis of the latter. (For example, the recommendation of one product or one promotional offer or an insurance rate representative of the risk.) Models users must ensure that models do not discriminate against and that it is also possible to explain its result. The widening of the panel of predictive algorithms - made possible by the evolution of computing capacities -- leads scientists to be vigilant about the use of models and to consider new tools to better understand the decisions deduced from them . Recently, the community has been particularly active on model transparency with a marked intensification of publications over the past three years. The increasingly frequent use of more complex algorithms (\textit{deep learning}, Xgboost, etc.) presenting attractive performances is undoubtedly one of the causes of this interest. This article thus presents an inventory of methods of interpreting models and their uses in an insurance context.


What is Bootstrap Sampling in Machine Learning and Why is it Important?

#artificialintelligence

The Bootstrap Sampling Method is a very simple concept and is a building block for some of the more advanced machine learning algorithms like AdaBoost and XGBoost. However, when I started my data science journey, I couldn't quite understand the point of it. So my goals are to explain what the bootstrap method is and why it's important to know! Technically speaking, the bootstrap sampling method is a resampling method that uses random sampling with replacement. Don't worry if that sounded confusing, let me explain it with a diagram: Suppose you have an initial sample with 3 observations.


Classification with Random Forests in Python

#artificialintelligence

The random forests algorithm is a machine learning method that can be used for supervised learning tasks such as classification and regression. The algorithm works by constructing a set of decision trees trained on random subsets of features. In the case of classification, the output of a random forest model is the mode of the predicted classes across the decision trees. In this post, we will discuss how to build random forest models for classification tasks in python. In this post, you'll see Classification with Random Forests in Python The random forests algorithm is a machine learning method that can be used for supervised learning tasks such as classification and regression.


A complete explanation of Random Forest Algorithm.

#artificialintelligence

Ensemble learning is a technique where there is a joining of different types of algorithm or same types of algorithm and then it forms a more powerful regression and classification model. Here, in the random forest algorithm, it combines with multiple decision trees and forms a model. Because of its diversity and simplicity, it is one of the most used algorithms. It is used for both classification and regression problems.


How to Develop a Bagging Ensemble with Python

#artificialintelligence

Bagging is an ensemble machine learning algorithm that combines the predictions from many decision trees. It is also easy to implement given that it has few key hyperparameters and sensible heuristics for configuring these hyperparameters. Bagging performs well in general and provides the basis for a whole field of ensemble of decision tree algorithms such as the popular random forest and extra trees ensemble algorithms, as well as the lesser-known Pasting, Random Subspaces, and Random Patches ensemble algorithms. In this tutorial, you will discover how to develop Bagging ensembles for classification and regression. How to Develop a Bagging Ensemble in Python Photo by daveynin, some rights reserved. Bootstrap Aggregation, or Bagging for short, is an ensemble machine learning algorithm. Specifically, it is an ensemble of decision tree models, although the bagging technique can also be used to combine the predictions of other types of models.


Random Forest for Dissimilarity-based Multi-view Learning

arXiv.org Machine Learning

Many classification problems are naturally multi-view in the sense their data are described through multiple heterogeneous descriptions. For such tasks, dissimilarity strategies are effective ways to make the different descriptions comparable and to easily merge them, by (i) building intermediate dissimilarity representations for each view and (ii) fusing these representations by averaging the dissimilarities over the views. In this work, we show that the Random Forest proximity measure can be used to build the dissimilarity representations, since this measure reflects similarities between features but also class membership. We then propose a Dynamic View Selection method to better combine the view-specific dissimilarity representations. This allows to take a decision, on each instance to predict, with only the most relevant views for that instance. Experiments are conducted on several real-world multi-view datasets, and show that the Dynamic View Selection offers a significant improvement in performance compared to the simple average combination and two state-of-the-art static view combinations.