AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Lale: Consistent Automated Machine Learning

Baudart, Guillaume, Hirzel, Martin, Kate, Kiran, Ram, Parikshit, Shinnar, Avraham

arXiv.org Artificial IntelligenceJul-3-2020

Automated machine learning makes it easier for data scientists to develop pipelines by searching over possible choices for hyperparameters, algorithms, and even pipeline topologies. Unfortunately, the syntax for automated machine learning tools is inconsistent with manual machine learning, with each other, and with error checks. Furthermore, few tools support advanced features such as topology search or higher-order operators. This paper introduces Lale, a library of high-level Python interfaces that simplifies and unifies automated machine learning in a consistent way.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Artificial Intelligence

2007.01977

Country: North America > United States (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Artificial Intelligence revolutionizes the insurance industry

#artificialintelligenceJul-2-2020, 16:26:55 GMT

Pricing: Through predictive models (with algorithms such as random forest, linear regression, xgboost, etc.), we can provide insurance premiums in a more dynamic and precise way. More specifically, they can be personalized according to driving habits, geographic area and commute distance. To the traditional price-setting variables, a new set of variables are added to improve the profitability of the portfolio. These variables depend on the company's own needs/capacities and can range from competitors' prices to the policyholder's traffic record, driver's license age, credit score, as well as external data systems and sources. The interesting thing here is the dynamism in determining the price; the models change based on data inputted over time, then recognize patterns and adjust the rate autonomously.

artificial intelligence revolutionize, insurance industry, machine learning

#artificialintelligence

Industry: Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

Add feedback

Uncertainty in Gradient Boosting via Ensembles

Ustimenko, Aleksei, Prokhorenkova, Liudmila, Malinin, Andrey

arXiv.org Machine LearningJul-2-2020

Gradient boosting is a powerful machine learning technique that is particularly successful for tasks containing heterogeneous features and noisy data. While gradient boosting classification models return a distribution over class labels, regressions models typically yield only point predictions. However, for many practical, high-risk applications, it is also important to be able to quantify uncertainty in the predictions to avoid costly mistakes. In this work, we examine a probabilistic ensemble-based framework for deriving uncertainty estimates in the predictions of gradient boosting classification and regression models. Crucially, the proposed approach allows the total uncertainty to be decomposed into \textit{data uncertainty}, which comes from the complexity and noise in data distribution, and \textit{knowledge uncertainty}, coming from the lack of information about a given region of the feature space. Two approaches for generating ensembles are considered: Stochastic Gradient Boosting (SGB) and Stochastic Gradient Langevin Boosting (SGLB). Notably, SGLB also enables the generation of a \emph{virtual} ensemble via only one gradient boosting model, which significantly reduces complexity. Experiments on a range of regression and classification datasets show that ensembles of gradient boosting models yield improved predictive performance, and measures of uncertainty successfully enable detection of out-of-domain inputs.

artificial intelligence, ensemble, machine learning, (19 more...)

arXiv.org Machine Learning

2006.10562

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Introducing the open-source Amazon SageMaker XGBoost algorithm container

#artificialintelligenceJun-29-2020, 13:50:06 GMT

XGBoost is a popular and efficient machine learning (ML) algorithm for regression and classification tasks on tabular datasets. It implements a technique known as gradient boosting on trees and performs remarkably well in ML competitions. Since its launch, Amazon SageMaker has supported XGBoost as a built-in managed algorithm. For more information, see Simplify machine learning with XGBoost and Amazon SageMaker. As of this writing, you can take advantage of the open-source Amazon SageMaker XGBoost container, which has improved flexibility, scalability, extensibility, and Managed Spot Training.

artificial intelligence, container, machine learning, (15 more...)

#artificialintelligence

Genre: Press Release (0.42)

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Do Decision Trees need Feature Scaling?

#artificialintelligenceJun-28-2020, 16:55:10 GMT

Machine Learning algorithms have always been on the path towards evolution since its inception. Today the domain has come a long way from mathematical modelling to ensemble modelling and more. This evolution has seen more robust and SOTA models which is almost bridging the gap between potentials capabilities of human and AI. Ensemble modelling has given us one of those SOTA model XGBoost. Recently I happened to participate in a Machine Learning Hiring Challenge where the problem statement was a classification problem.

artificial intelligence, feature scaling, machine learning, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.43)

Add feedback

DriveML: Self-Drive Machine Learning Projects

#artificialintelligenceJun-27-2020, 03:00:46 GMT

Implementing some of the pillars of an automated machine learning pipeline such as (i) Automated data preparation, (ii) Feature engineering, (iii) Model building in classification context that includes techniques such as (a) Regularised regression [1], (b) Logistic regression [2], (c) Random Forest [3], (d) Decision tree [4] and (e) Extreme Gradient Boosting (xgboost) [5], and finally, (iv) Model explanation (using lift chart and partial dependency plots). Also provides some additional features such as generating missing at random (MAR) variables and automated exploratory data analysis. Moreover, function exports the model results with the required plots in an HTML vignette report format that follows the best practices of the industry and the academia.

artificial intelligence, decision tree learning, self-drive machine learning project, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Engineering Blog - Learnings from Distributed XGBoost on Amazon SageMaker

#artificialintelligenceJun-25-2020, 09:06:10 GMT

XGBoost is a popular Python library for gradient boosted decision trees. The implementation allows practitioners to distribute training across multiple compute instances (or workers), which is especially useful for large training sets. One tool used at Zalando for deploying production machine learning models is the managed service from Amazon called SageMaker. XGBoost is already included in SageMaker as a built-in algorithm, meaning that a prebuilt docker container is available. This container also supports distributed training, making it easy to scale training jobs across many instances.

artificial intelligence, machine learning, training time, (17 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

Fakoor, Rasool, Mueller, Jonas, Erickson, Nick, Chaudhari, Pratik, Smola, Alexander J.

arXiv.org Machine LearningJun-25-2020

Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators. While highly accurate, the resulting predictors are large, slow, and opaque as compared to their constituents. To improve the deployment of AutoML on tabular data, we propose FAST-DAD to distill arbitrarily complex ensemble predictors into individual models like boosted trees, random forests, and deep networks. At the heart of our approach is a data augmentation strategy based on Gibbs sampling from a self-attention pseudolikelihood estimator. Across 30 datasets spanning regression and binary/multiclass classification tasks, FAST-DAD distillation produces significantly better individual models than one obtains through standard training on the original data. Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

2006.14284

Country:

North America > Mexico > Gulf of Mexico (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Add feedback

Design and Evaluation of Personalized Free Trials

Yoganarasimhan, Hema, Barzegary, Ebrahim, Pani, Abhishek

arXiv.org Machine LearningJun-23-2020

Free trial promotions, where users are given a limited time to try the product for free, are a commonly used customer acquisition strategy in the Software as a Service (SaaS) industry. We examine how trial length affect users' responsiveness, and seek to quantify the gains from personalizing the length of the free trial promotions. Our data come from a large-scale field experiment conducted by a leading SaaS firm, where new users were randomly assigned to 7, 14, or 30 days of free trial. First, we show that the 7-day trial to all consumers is the best uniform policy, with a 5.59% increase in subscriptions. Next, we develop a three-pronged framework for personalized policy design and evaluation. Using our framework, we develop seven personalized targeting policies based on linear regression, lasso, CART, random forest, XGBoost, causal tree, and causal forest, and evaluate their performances using the Inverse Propensity Score (IPS) estimator. We find that the personalized policy based on lasso performs the best, followed by the one based on XGBoost. In contrast, policies based on causal tree and causal forest perform poorly. We then link a method's effectiveness in designing policy with its ability to personalize the treatment sufficiently without over-fitting (i.e., capture spurious heterogeneity). Next, we segment consumers based on their optimal trial length and derive some substantive insights on the drivers of user behavior in this context. Finally, we show that policies designed to maximize short-run conversions also perform well on long-run outcomes such as consumer loyalty and profitability.

artificial intelligence, machine learning, treatment effect, (19 more...)

arXiv.org Machine Learning

2006.1342

Country:

Europe > Germany (0.04)
Asia > Japan (0.04)
Oceania > New Zealand (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.86)

Industry: Information Technology > Software (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Learning Gradient Boosted Multi-label Classification Rules

Rapp, Michael, Mencía, Eneldo Loza, Fürnkranz, Johannes, Nguyen, Vu-Linh, Hüllermeier, Eyke

arXiv.org Machine LearningJun-23-2020

In multi-label classification, where the evaluation of predictions is less straightforward than in single-label classification, various meaningful, though different, loss functions have been proposed. Ideally, the learning algorithm should be customizable towards a specific choice of the performance measure. Modern implementations of boosting, most prominently gradient boosted decision trees, appear to be appealing from this point of view. However, they are mostly limited to single-label classification, and hence not amenable to multi-label losses unless these are label-wise decomposable. In this work, we develop a generalization of the gradient boosting framework to multi-output problems and propose an algorithm for learning multi-label classification rules that is able to minimize decomposable as well as non-decomposable loss functions. Using the well-known Hamming loss and subset 0/1 loss as representatives, we analyze the abilities and limitations of our approach on synthetic data and evaluate its predictive performance on multi-label benchmarks.

artificial intelligence, loss function, machine learning, (17 more...)

arXiv.org Machine Learning

2006.13346

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Austria > Upper Austria > Linz (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback