AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Performance Evaluation of Classification Models for Household Income, Consumption and Expenditure Data Set

Nigus, Mersha, Dorsewamy, null

arXiv.org Artificial IntelligenceJun-18-2021

Food security is more prominent on the policy agenda today than it has been in the past, thanks to recent food shortages at both the regional and global levels as well as renewed promises from major donor countries to combat chronic hunger. One field where machine learning can be used is in the classification of household food insecurity. In this study, we establish a robust methodology to categorize whether or not a household is being food secure and food insecure by machine learning algorithms. In this study, we have used ten machine learning algorithms to classify the food security status of the Household. Gradient Boosting (GB), Random Forest (RF), Extra Tree (ET), Bagging, K-Nearest Neighbor (KNN), Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), Ada Boost (AB) and Naive Bayes were the classification algorithms used throughout this study (NB). Then, we perform classification tasks from developing data set for household food security status by gathering data from HICE survey data and validating it by Domain Experts. The performance of all classifiers has better results for all performance metrics. The performance of the Random Forest and Gradient Boosting models are outstanding with a testing accuracy of 0.9997 and the other classifier such as Bagging, Decision tree, Ada Boost, Extra tree, K-nearest neighbor, Logistic Regression, SVM and Naive Bayes are scored 0.9996, 0.09996, 0.9994, 0.95675, 0.9415, 0.8915, 0.7853 and 0.7595, respectively.

algorithm, classification, classifier, (15 more...)

arXiv.org Artificial Intelligence

2106.11055

Country:

Africa > Uganda (0.05)
Asia > India > Karnataka (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Food & Agriculture (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

Build XGBoost models with Amazon Redshift ML

#artificialintelligenceJun-17-2021, 16:34:29 GMT

Amazon Redshift ML allows data analysts, developers, and data scientists to train machine learning (ML) models using SQL. In previous posts, we demonstrated how customers can use the automatic model training capability of Amazon Redshift to train their classification and regression models. Redshift ML provides several capabilities for data scientists. It allows you to create a model using SQL and specify your algorithm as XGBoost. It also lets you bring your pre-trained XGBoost model into Amazon Redshift for local inference.

create model command, data scientist, redshift ml, (11 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Explainable Machine Learning with LIME and H2O in R

#artificialintelligenceJun-16-2021, 16:16:05 GMT

Welcome to this hands-on, guided introduction to Explainable Machine Learning with LIME and H2O in R. By the end of this project, you will be able to use the LIME and H2O packages in R for automatic and interpretable machine learning, build classification models quickly with H2O AutoML and explain and interpret model predictions using LIME. Machine learning (ML) models such as Random Forests, Gradient Boosted Machines, Neural Networks, Stacked Ensembles, etc., are often considered black boxes. However, they are more accurate for predicting non-linear phenomena due to their flexibility. Experts agree that higher accuracy often comes at the price of interpretability, which is critical to business adoption, trust, regulatory oversight (e.g., GDPR, Right to Explanation, etc.). As more industries from healthcare to banking are adopting ML models, their predictions are being used to justify the cost of healthcare and for loan approvals or denials.

explainable machine learning, interpretability, machine learning, (3 more...)

#artificialintelligence

Country: North America (0.07)

Genre: Instructional Material (0.44)

Industry: Health & Medicine (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.60)

Add feedback

Random Forest Algorithm in Python from Scratch

#artificialintelligenceJun-15-2021, 13:05:29 GMT

The intuition behind the random forest algorithm can be split into two big parts: the random part and the forest part. Let us start with the latter. A forest in real life is made up of a bunch of trees. A random forest classifier is made up of a bunch of decision tree classifiers (here and throughout the text -- DT). The exact amount of DTs that make up the whole forest is defined with the n_estimators variable mentioned earlier.

decision tree, prediction, random forest algorithm, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)

Add feedback

RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests

Alakus, Cansu, Larocque, Denis, Labbe, Aurelie

arXiv.org Machine LearningJun-15-2021

Like many predictive models, random forests provide a point prediction for a new observation. Besides the point prediction, it is important to quantify the uncertainty in the prediction. Prediction intervals provide information about the reliability of the point predictions. We have developed a comprehensive R package, RFpredInterval, that integrates 16 methods to build prediction intervals with random forests and boosted forests. The methods implemented in the package are a new method to build prediction intervals with boosted forests (PIBF) and 15 different variants to produce prediction intervals with random forests proposed by Roy and Larocque (2020). We perform an extensive simulation study and apply real data analyses to compare the performance of the proposed method to ten existing methods to build prediction intervals with random forests. The results show that the proposed method is very competitive and, globally, it outperforms the competing methods.

pi length, prediction interval, random forest, (13 more...)

arXiv.org Machine Learning

2106.08217

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Iowa > Story County > Ames (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Analysis of the Evolution of Parametric Drivers of High-End Sea-Level Hazards

Hough, Alana, Wong, Tony E.

arXiv.org Artificial IntelligenceJun-10-2021

Climate models are critical tools for developing strategies to manage the risks posed by sea-level rise to coastal communities. While these models are necessary for understanding climate risks, there is a level of uncertainty inherent in each parameter in the models. This model parametric uncertainty leads to uncertainty in future climate risks. Consequently, there is a need to understand how those parameter uncertainties impact our assessment of future climate risks and the efficacy of strategies to manage them. Here, we use random forests to examine the parametric drivers of future climate risk and how the relative importances of those drivers change over time. We find that the equilibrium climate sensitivity and a factor that scales the effect of aerosols on radiative forcing are consistently the most important climate model parametric uncertainties throughout the 2020 to 2150 interval for both low and high radiative forcing scenarios. The near-term hazards of high-end sea-level rise are driven primarily by thermal expansion, while the longer-term hazards are associated with mass loss from the Antarctic and Greenland ice sheets. Our results highlight the practical importance of considering time-evolving parametric uncertainties when developing strategies to manage future climate risks.

artificial intelligence, machine learning, random forest, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.5194/ascmo-8-117-2022

2106.12041

Country:

North America > Greenland (0.25)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
(10 more...)

Genre: Research Report > New Finding (0.48)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.38)

Add feedback

GBHT: Gradient Boosting Histogram Transform for Density Estimation

Cui, Jingyi, Hang, Hanyuan, Wang, Yisen, Lin, Zhouchen

arXiv.org Machine LearningJun-10-2021

In this paper, we propose a density estimation algorithm called \textit{Gradient Boosting Histogram Transform} (GBHT), where we adopt the \textit{Negative Log Likelihood} as the loss function to make the boosting procedure available for the unsupervised tasks. From a learning theory viewpoint, we first prove fast convergence rates for GBHT with the smoothness assumption that the underlying density function lies in the space $C^{0,\alpha}$. Then when the target density function lies in spaces $C^{1,\alpha}$, we present an upper bound for GBHT which is smaller than the lower bound of its corresponding base learner, in the sense of convergence rates. To the best of our knowledge, we make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems. In experiments, we not only conduct performance comparisons with the widely used KDE, but also apply GBHT to anomaly detection to showcase a further application of GBHT.

density estimation, gbht, histogram transform, (12 more...)

arXiv.org Machine Learning

2106.05738

Country:

North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)
Europe > Netherlands (0.04)
(2 more...)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Yes, XGBoost is cool, but have you heard of CatBoost?

#artificialintelligenceJun-8-2021, 14:45:13 GMT

If you've worked as a data scientist, competed in Kaggle competitions, or even browsed data science articles on the internet, there's a high chance that you've heard of XGBoost. Even today, it is often the go-to algorithm for many Kagglers and data scientists working on general machine learning tasks. While XGBoost is popular for good reasons, it does have some limitations, which I mentioned in my article below. Odds are, you've probably heard of XGBoost, have you ever heard of CatBoost? CatBoost is another open-source gradient boosting library that was created by researchers at Yandex.

catboost, dataset, xgboost, (17 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

TabNet: The End of Gradient Boosting?

#artificialintelligenceJun-7-2021, 12:55:37 GMT

Gradient Boosting models such as XGBoost, LightGBM and Catboost have long been considered best in class for tabular data. Even with rapid progress in NLP and Computer Vision, Neural Networks are still routinely surpassed by tree-based models on tabular data. Enter Google's TabNet in 2019. According to the paper, this Neural Network was able to outperform the leading tree based models across a variety of benchmarks. Not only that, it is considerably more explainable than boosted tree models as it has built-in explainability.

explainability, neural network, tabnet, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Multivariate Probabilistic Regression with Natural Gradient Boosting

O'Malley, Michael, Sykulski, Adam M., Lumpkin, Rick, Schuler, Alejandro

arXiv.org Machine LearningJun-7-2021

Many single-target regression problems require estimates of uncertainty along with the point predictions. Probabilistic regression algorithms are well-suited for these tasks. However, the options are much more limited when the prediction target is multivariate and a joint measure of uncertainty is required. For example, in predicting a 2D velocity vector a joint uncertainty would quantify the probability of any vector in the plane, which would be more expressive than two separate uncertainties on the x- and y- components. To enable joint probabilistic regression, we propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches. We demonstrate these claims in simulation and with a case study predicting two-dimensional oceanographic velocity data. An implementation of our method is available at https://github.com/stanfordmlgroup/ngboost.

application, gradient, matrix, (14 more...)

arXiv.org Machine Learning

2106.03823

Country:

Atlantic Ocean > North Atlantic Ocean (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback