AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Hurricane Forecasting: A Novel Multimodal Machine Learning Framework

Boussioux, Léonard, Zeng, Cynthia, Guénais, Théo, Bertsimas, Dimitris

arXiv.org Artificial IntelligenceNov-11-2020

This paper describes a machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple distinct ML techniques and utilizing diverse data sources. Our framework, which we refer to as Hurricast (HURR), is built upon the combination of distinct data processing techniques using gradient-boosted trees and novel encoder-decoder architectures, including CNN, GRU and Transformers components. We propose a deep-feature extractor methodology to mix spatial-temporal data with statistical data efficiently. Our multimodal framework unleashes the potential of making forecasts based on a wide range of data sources, including historical storm data, reanalysis atmospheric images, and operational forecasts. Evaluating our models with current operational forecasts in North Atlantic and Eastern Pacific basins on the last years of available data, results show our models consistently outperform statistical-dynamical models and, albeit less accurate than the best dynamical models, our framework computes forecasts in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model leads to a significant improvement of 5% - 15% over NHC's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that combining different data sources and distinct machine learning methodologies can lead to superior tropical cyclone forecasting.

architecture, forecast, forecasting, (15 more...)

arXiv.org Artificial Intelligence

2011.06125

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.88)

Add feedback

Margins are Insufficient for Explaining Gradient Boosting

Grønlund, Allan, Kamma, Lior, Larsen, Kasper Green

arXiv.org Machine LearningNov-10-2020

Boosting is one of the most successful ideas in machine learning, achieving great practical performance with little fine-tuning. The success of boosted classifiers is most often attributed to improvements in margins. The focus on margin explanations was pioneered in the seminal work by Schapire et al. (1998) and has culminated in the $k$'th margin generalization bound by Gao and Zhou (2013), which was recently proved to be near-tight for some data distributions (Gronlund et al. 2019). In this work, we first demonstrate that the $k$'th margin bound is inadequate in explaining the performance of state-of-the-art gradient boosters. We then explain the short comings of the $k$'th margin bound and prove a stronger and more refined margin-based generalization bound for boosted classifiers that indeed succeeds in explaining the performance of modern gradient boosters. Finally, we improve upon the recent generalization lower bound by Gr{\o}nlund et al. (2019).

generalization, prediction, probability, (15 more...)

arXiv.org Machine Learning

2011.04998

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

The Macroeconomy as a Random Forest

Coulombe, Philippe Goulet

arXiv.org Machine LearningNov-8-2020

I develop Macroeconomic Random Forest (MRF), an algorithm adapting the canonical Machine Learning (ML) tool to flexibly model evolving parameters in a linear macro equation. Its main output, Generalized Time-Varying Parameters (GTVPs), is a versatile device nesting many popular nonlinearities (threshold/switching, smooth transition, structural breaks/change) and allowing for sophisticated new ones. The approach delivers clear forecasting gains over numerous alternatives, predicts the 2008 drastic rise in unemployment, and performs well for inflation. Unlike most ML-based methods, MRF is directly interpretable -- via its GTVPs. For instance, the successful unemployment forecast is due to the influence of forward-looking variables (e.g., term spreads, housing starts) nearly doubling before every recession. Interestingly, the Phillips curve has indeed flattened, and its might is highly cyclical.

forecast, inflation, recession, (16 more...)

arXiv.org Machine Learning

2006.12724

Country:

Europe > Netherlands (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Economy (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Residual Likelihood Forests

Zuo, Yan, Drummond, Tom

arXiv.org Machine LearningNov-3-2020

Ensemble and Boosting methods such as Random Forests [3] and AdaBoost [19] are often recognized as some of the best out-of-the-box classifiers, consistently achieving state-ofthe-art performance across a wide range of computer vision tasks including applications in image classification [1], semantic segmentation [22], object recognition [12] and data clustering [16]. The success of these methods is attributed to their ability to learn models (strong learners) which possess low bias and variance through the combination of weakly correlated learners (weak learners). Forests reduce variance through averaging its weak learners over the ensemble. Boosting, on the other hand, looks towards reducing both bias and variance through sequentially optimizing under conditional constraints. The commonality between both approaches is in the way each learner is constructed: both methods use a top-down induction algorithm (such as CART [4]) which greedily learns decision nodes in a recursive manner. This approach is known to be suboptimal in terms of objective maximization as there are no guarantees that a global loss is being minimized [14]. In practice, this type of optimization requires the non-linearity offered by several (very) deep trees, which results in redundancy in learned models with large overlaps of information between weak learners. To address these limitations, the ensemble approaches of [11, 20] have utilized gradient information within a boosting framework. This allows weak learners to be fit via pseudoresiduals or to a set of adaptive weights and allows for the minimization of a global loss via gradient descent.

artificial intelligence, machine learning, weak learner, (16 more...)

arXiv.org Machine Learning

2011.02086

Country:

Oceania > Australia (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Brain Predictability toolbox: a Python library for neuroimaging based machine learning

Hahn, Sage, Yuan, Dekang, Thompson, Wesley, Owens, Max M, Allgaier, Nicholas, Garavan, Hugh

arXiv.org Machine LearningNov-3-2020

Summary Brain Predictability toolbox (BPt) represents a unified framework of machine learning (ML) tools designed to work with both tabulated data (in particular brain, psychiatric, behavioral, and physiological variables) and neuroimaging specific derived data (e.g., brain volumes and surfaces). This package is suitable for investigating a wide range of different neuroimaging based ML questions, in particular, those queried from large human datasets. Availability and Implementation BPt has been developed as an open-source Python 3.6+ package hosted at https://github.com/sahahn/BPt under MIT License, with documentation provided at https://bpt.readthedocs.io/en/latest/, and continues to be actively developed. The project can be downloaded through the github link provided. A web GUI interface based on the same code is currently under development and can be set up through docker with instructions at https://github.com/sahahn/BPt_app. Contact Please contact Sage Hahn at sahahn@uvm.edu

artificial intelligence, library, machine learning, (13 more...)

arXiv.org Machine Learning

2011.01715

Country:

North America > United States > Vermont > Chittenden County > Burlington (0.15)
North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > California > San Diego County > La Jolla (0.05)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.98)
Health & Medicine > Health Care Technology (0.98)
Health & Medicine > Diagnostic Medicine > Imaging (0.98)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.30)

Add feedback

Imbalanced-learn: Handling imbalanced class problem

#artificialintelligenceOct-29-2020, 06:15:12 GMT

In the previous article here, we have gone through the different methods to deal with imbalanced data. In this article, let us try to understand how to use imbalanced-learn library to deal with imbalanced class problems. We will make use of Pycaret library and UCI's default of credit card client dataset which is also in-built into PyCaret. Imbalanced-learn is a python package that provides a number of re-sampling techniques to deal with class imbalance problems commonly encountered in classification tasks. Note that imbalanced-learn is compatible with scikit-learn and is also part of scikit-learn-contrib projects.

artificial intelligence, machine learning, over-sampling, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.31)

Add feedback

Targeting for long-term outcomes

Yang, Jeremy, Eckles, Dean, Dhillon, Paramveer, Aral, Sinan

arXiv.org Machine LearningOct-29-2020

Decision-makers often want to target interventions (e.g., marketing campaigns) so as to maximize an outcome that is observed only in the long-term. This typically requires delaying decisions until the outcome is observed or relying on simple short-term proxies for the long-term outcome. Here we build on the statistical surrogacy and off-policy learning literature to impute the missing long-term outcomes and then approximate the optimal targeting policy on the imputed outcomes via a doubly-robust approach. We apply our approach in large-scale proactive churn management experiments at The Boston Globe by targeting optimal discounts to its digital subscribers to maximize their long-term revenue. We first show that conditions for validity of average treatment effect estimation with imputed outcomes are also sufficient for valid policy evaluation and optimization; furthermore, these conditions can be somewhat relaxed for policy optimization. We then validate this approach empirically by comparing it with a policy learned on the ground truth long-term outcomes and show that they are statistically indistinguishable. Our approach also outperforms a policy learned on short-term proxies for the long-term outcome. In a second field experiment, we implement the optimal targeting policy with additional randomized exploration, which allows us to update the optimal policy for each new cohort of customers to account for potential non-stationarity. Over three years, our approach had a net-positive revenue impact in the range of $4-5 million compared to The Boston Globe's current policies.

artificial intelligence, machine learning, subscriber, (18 more...)

arXiv.org Machine Learning

2010.15835

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Michigan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.93)
Media > News (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.94)
Information Technology > Data Science (0.92)

Add feedback

Complete Guide To XGBoost With Implementation In R

#artificialintelligenceOct-27-2020, 05:30:32 GMT

In recent times, ensemble techniques have become popular among data scientists and enthusiasts. Until now Random Forest and Gradient Boosting algorithms were winning the data science competitions and hackathons, over the period of the last few years XGBoost has been performing better than other algorithms on problems involving structured data. Apart from its performance, XGBoost is also recognized for its speed, accuracy and scale. XGBoost is developed on the framework of Gradient Boosting. Just like other boosting algorithms XGBoost uses decision trees for its ensemble model.

artificial intelligence, machine learning, xgboost, (10 more...)

#artificialintelligence

Genre: Contests & Prizes (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Pay as you go machine learning inference with AWS Lambda

#artificialintelligenceOct-26-2020, 22:16:27 GMT

This post is courtesy of Eitan Sela, Senior Startup Solutions Architect. Many customers want to deploy machine learning models for real-time inference, and pay only for what they use. Using Amazon EC2 instances for real-time inference may not be cost effective to support sporadic inference requests throughout the day. AWS Lambda is a serverless compute service with pay-per-use billing. However, ML frameworks like XGBoost are too large to fit into the 250 MB application artifact size limit, or the 512 MB /tmp space limit.

artificial intelligence, lambda function, machine learning, (16 more...)

#artificialintelligence

Country: North America > United States > Wisconsin (0.05)

Industry:

Retail > Online (0.40)
Health & Medicine > Therapeutic Area > Oncology (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.42)

Add feedback

Versatile Verification of Tree Ensembles

Devos, Laurens, Meert, Wannes, Davis, Jesse

arXiv.org Artificial IntelligenceOct-26-2020

Machine learned models often must abide by certain requirements (e.g., fairness or legal). This has spurred interested in developing approaches that can provably verify whether a model satisfies certain properties. This paper introduces a generic algorithm called Veritas that enables tackling multiple different verification tasks for tree ensemble models like random forests (RFs) and gradient boosting decision trees (GBDTs). This generality contrasts with previous work, which has focused exclusively on either adversarial example generation or robustness checking. Veritas formulates the verification task as a generic optimization problem and introduces a novel search space representation. Veritas offers two key advantages. First, it provides anytime lower and upper bounds when the optimization problem cannot be solved exactly. In contrast, many existing methods have focused on exact solutions and are thus limited by the verification problem being NP-complete. Second, Veritas produces full (bounded suboptimal) solutions that can be used to generate concrete examples. We experimentally show that Veritas outperforms the previous state of the art by (a) generating exact solutions more frequently, (b) producing tighter bounds when (a) is not possible, and (c) offering orders of magnitude speed ups. Subsequently, Veritas enables tackling more and larger real-world verification scenarios.

artificial intelligence, ensemble, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2010.1388

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
North America > United States > California (0.05)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback