Uncertainty in Gradient Boosting via Ensembles

Ustimenko, Aleksei, Prokhorenkova, Liudmila, Malinin, Andrey

Jul-2-2020–arXiv.org Machine Learning

Gradient boosting is a powerful machine learning technique that is particularly successful for tasks containing heterogeneous features and noisy data. While gradient boosting classification models return a distribution over class labels, regressions models typically yield only point predictions. However, for many practical, high-risk applications, it is also important to be able to quantify uncertainty in the predictions to avoid costly mistakes. In this work, we examine a probabilistic ensemble-based framework for deriving uncertainty estimates in the predictions of gradient boosting classification and regression models. Crucially, the proposed approach allows the total uncertainty to be decomposed into \textit{data uncertainty}, which comes from the complexity and noise in data distribution, and \textit{knowledge uncertainty}, coming from the lack of information about a given region of the feature space. Two approaches for generating ensembles are considered: Stochastic Gradient Boosting (SGB) and Stochastic Gradient Langevin Boosting (SGLB). Notably, SGLB also enables the generation of a \emph{virtual} ensemble via only one gradient boosting model, which significantly reduces complexity. Experiments on a range of regression and classification datasets show that ensembles of gradient boosting models yield improved predictive performance, and measures of uncertainty successfully enable detection of out-of-domain inputs.

artificial intelligence, ensemble, machine learning, (19 more...)

arXiv.org Machine Learning

Jul-2-2020

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)

Genre:
- Research Report > Experimental Study (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Ensemble Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found