AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Comparing interpretability and explainability for feature selection

Dunn, Jack, Mingardi, Luca, Zhuo, Ying Daisy

arXiv.org Machine LearningMay-11-2021

A common approach for feature selection is to examine the variable importance scores for a machine learning model, as a way to understand which features are the most relevant for making predictions. Given the significance of feature selection, it is crucial for the calculated importance scores to reflect reality. Falsely overestimating the importance of irrelevant features can lead to false discoveries, while underestimating importance of relevant features may lead us to discard important features, resulting in poor model performance. Additionally, black-box models like XGBoost provide state-of-the art predictive performance, but cannot be easily understood by humans, and thus we rely on variable importance scores or methods for explainability like SHAP to offer insight into their behavior. In this paper, we investigate the performance of variable importance as a feature selection method across various black-box and interpretable machine learning methods. We compare the ability of CART, Optimal Trees, XGBoost and SHAP to correctly identify the relevant subset of variables across a number of experiments. The results show that regardless of whether we use the native variable importance method or SHAP, XGBoost fails to clearly distinguish between relevant and irrelevant features. On the other hand, the interpretable methods are able to correctly and efficiently identify irrelevant features, and thus offer significantly better performance for feature selection.

experiment, feature selection, unique value, (14 more...)

arXiv.org Machine Learning

2105.05328

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.82)

Add feedback

An Extensive Analytical Approach on Human Resources using Random Forest Algorithm

papineni, Swarajya lakshmi v, Reddy, A. Mallikarjuna, yarlagadda, Sudeepti, Yarlagadda, Snigdha, Akkinen, Haritha

arXiv.org Artificial IntelligenceMay-7-2021

The current job survey shows that most software employees are planning to change their job role due to high pay for recent jobs such as data scientists, business analysts and artificial intelligence fields. The survey also indicated that work life imbalances, low pay, uneven shifts and many other factors also make employees think about changing their work life. In this paper, for an efficient organisation of the company in terms of human resources, the proposed system designed a model with the help of a random forest algorithm by considering different employee parameters. This helps the HR department retain the employee by identifying gaps and helping the organisation to run smoothly with a good employee retention ratio. This combination of HR and data science can help the productivity, collaboration and well-being of employees of the organisation. It also helps to develop strategies that have an impact on the performance of employees in terms of external and social factors.

algorithm, class label, entropy, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.14445/22315381/IJETT-V69I5P217

2105.07855

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Singapore (0.04)
Asia > India > Telangana > Hyderabad (0.04)
(3 more...)

Genre: Research Report (0.65)

Industry:

Information Technology (0.88)
Education > Educational Setting (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.74)

Add feedback

Machine Learning in Python Bootcamp with 5 Capstone Projects Coupon

#artificialintelligenceApr-30-2021, 09:30:43 GMT

Udemy Coupon Code For Machine Learning in Python Bootcamp with 5 Capstone Projects, Find Out Other Highest rated and Bestselling Machine Learning Courses with Discount Coupon Codes.

logistic regression, regression, theory and practical implementation, (11 more...)

#artificialintelligence

Industry: Education (0.99)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Machine Learning Models Can Predict Persistence of Early Childhood Asthma - Pulmonology Advisor

#artificialintelligenceApr-30-2021, 02:40:34 GMT

Machine learning modules can be trained with the use of electronic health record (EHR) data to differentiate between transient and persistent cases of early childhood asthma, according the results of an analysis published in PLoS One. Researchers conducted a retrospective cohort study using data derived from the Pediatric Big Data (PBD) resource at the Children's Hospital of Philadelphia (CHOP) -- a pediatric tertiary academic medical center located in Pennsylvania. The researchers sought to develop machine learning modules that could be used to identify individuals who were diagnosed with asthma at aged 5 years or younger whose symptoms will continue to persist and who will thus continue to experience asthma-related visits. They trained 5 machine learning modules to distinguish between individuals without any subsequent asthma-related visits (transient asthma diagnosis) from those who did experience asthma-related visits from 5 to 10 years of age (persistent asthma diagnosis), based on clinical information available in these children up to 5 years of age. The PBD resource used in the current study included data obtained from the CHOP Care Network -- a primary care network of more than 30 sites -- and from CHOP Specialty Care and Surgical Centers.

asthma diagnosis, diagnosis, early childhood asthma, (12 more...)

#artificialintelligence

Country: North America > United States > Pennsylvania (0.25)

Genre:

Instructional Material > Course Syllabus & Notes (0.70)
Research Report > New Finding (0.55)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.56)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.36)

Add feedback

Probabilistic water demand forecasting using quantile regression algorithms

#artificialintelligenceApr-29-2021, 03:45:25 GMT

Machine and statistical learning algorithms can be reliably automated and applied at scale. Therefore, they can constitute a considerable asset for designing practical forecasting systems, such as those related to urban water demand. Quantile regression algorithms are statistical and machine learning algorithms that can provide probabilistic forecasts in a straightforward way, and have not been applied so far for urban water demand forecasting. In this work, we aim to fill this gap by automating and extensively comparing several quantile-regression-based practical systems for probabilistic one-day ahead urban water demand forecasting. For designing the practical systems, we use five individual algorithms (i.e., the quantile regression, linear boosting, generalized random forest, gradient boosting machine and quantile regression neural network algorithms), their mean combiner and their median combiner.

algorithm, demand forecasting, water demand forecasting, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Forecasting (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)

Add feedback

How to plot XGBoost trees in R - Open Source Automation

#artificialintelligenceApr-28-2021, 00:03:13 GMT

In this post, we're going to cover how to plot XGBoost trees in R. XGBoost is a very popular machine learning algorithm, which is frequently used in Kaggle competitions and has many practical use cases. Let's start by loading the packages we'll need. Note that plotting XGBoost trees requires the DiagrammeR package to be installed, so even if you have xgboost installed already, you'll need to make sure you have DiagrammeR also. Next, let's read in our dataset. In this post, we'll be using this customer churn dataset. The label we'll be trying to predict is called "Exited" and is a binary variable with 1 meaning the customer churned (canceled account) vs. 0 meaning the customer did not churn (did not cancel account).

dataset, open source automation, plot xgboost tree, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Infinitesimal gradient boosting

Dombry, Clément, Duchamps, Jean-Jil

arXiv.org Machine LearningApr-26-2021

We define infinitesimal gradient boosting as a limit of the popular tree-based gradient boosting algorithm from machine learning. The limit is considered in the vanishing-learning-rate asymptotic, that is when the learning rate tends to zero and the number of gradient trees is rescaled accordingly. For this purpose, we introduce a new class of randomized regression trees bridging totally randomized trees and Extra Trees and using a softmax distribution for binary splitting. Our main result is the convergence of the associated stochastic algorithm and the characterization of the limiting procedure as the unique solution of a nonlinear ordinary differential equation in a infinite dimensional function space. Infinitesimal gradient boosting defines a smooth path in the space of continuous functions along which the training error decreases, the residuals remain centered and the total variation is well controlled.

equation, gradient, regression tree, (17 more...)

arXiv.org Machine Learning

2104.13208

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.92)

Add feedback

Tree-Based Machine Learning Algorithms

#artificialintelligenceApr-15-2021, 11:00:38 GMT

The simplest model is the Decision Tree. A combination of Decision Trees builds a Random Forest. Random Forest usually has higher accuracy than Decision Tree does. A group of Decision Trees built one after another by learning their predecessor is Adaptive Boosting and Gradient Boosting Machine. Adaptive and Gradient Boosting Machine can perform with better accuracy than Random Forest can. Extreme Gradient Boosting is created to compensate for the overfitting problem of Gradient Boosting. Thus, we can say that in general Extreme Gradient Boosting has the best accuracy amongst tree-based algorithms. Many say that Extreme Gradient Boosting wins many Machine Learning competitions. If you find this article useful, please feel free to share.

algorithm, decision tree, training dataset, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Enabling Machine Learning Algorithms for Credit Scoring -- Explainable Artificial Intelligence (XAI) methods for clear understanding complex predictive models

Biecek, Przemysław, Chlebus, Marcin, Gajda, Janusz, Gosiewska, Alicja, Kozak, Anna, Ogonowski, Dominik, Sztachelski, Jakub, Wojewnik, Piotr

arXiv.org Artificial IntelligenceApr-14-2021

Rapid development of advanced modelling techniques gives an opportunity to develop tools that are more and more accurate. However as usually, everything comes with a price and in this case, the price to pay is to loose interpretability of a model while gaining on its accuracy and precision. For managers to control and effectively manage credit risk and for regulators to be convinced with model quality the price to pay is too high. In this paper, we show how to take credit scoring analytics in to the next level, namely we present comparison of various predictive models (logistic regression, logistic regression with weight of evidence transformations and modern artificial intelligence algorithms) and show that advanced tree based models give best results in prediction of client default. What is even more important and valuable we also show how to boost advanced models using techniques which allow to interpret them and made them more accessible for credit risk practitioners, resolving the crucial obstacle in widespread deployment of more complex, 'black box' models like random forests, gradient boosted or extreme gradient boosted trees. All this will be shown on the large dataset obtained from the Polish Credit Bureau to which all the banks and most of the lending companies in the country do report the credit files. In this paper the data from lending companies were used. The paper then compares state of the art best practices in credit risk modelling with new advanced modern statistical tools boosted by the latest developments in the field of interpretability and explainability of artificial intelligence algorithms. We believe that this is a valuable contribution when it comes to presentation of different modelling tools but what is even more important it is showing which methods might be used to get insight and understanding of AI methods in credit risk context.

enabling machine learning algorithm, logistic regression, prediction, (10 more...)

arXiv.org Artificial Intelligence

2104.06735

Country:

Europe > Poland > Masovia Province > Warsaw (0.05)
Europe > Switzerland > Basel-City > Basel (0.05)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre:

Research Report > New Finding (0.56)
Research Report > Experimental Study (0.56)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback

Conclusive Local Interpretation Rules for Random Forests

Mollas, Ioannis, Bassiliades, Nick, Tsoumakas, Grigorios

arXiv.org Artificial IntelligenceApr-13-2021

In critical situations involving discrimination, gender inequality, economic damage, and even the possibility of casualties, machine learning models must be able to provide clear interpretations for their decisions. Otherwise, their obscure decision-making processes can lead to socioethical issues as they interfere with people's lives. In the aforementioned sectors, random forest algorithms strive, thus their ability to explain themselves is an obvious requirement. In this paper, we present LionForests, which relies on a preliminary work of ours. LionForests is a random forest-specific interpretation technique, which provides rules as explanations. It is applicable from binary classification tasks to multi-class classification and regression tasks, and it is supported by a stable theoretical background. Experimentation, including sensitivity analysis and comparison with state-of-the-art techniques, is also performed to demonstrate the efficacy of our contribution. Finally, we highlight a unique property of LionForests, called conclusiveness, that provides interpretation validity and distinguishes it from previous techniques.

algorithm, dataset, prediction, (16 more...)

arXiv.org Artificial Intelligence

2104.0604

Country:

Asia > Singapore (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Banking & Finance (0.67)
Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback