Goto

Collaborating Authors

gradient boosting


Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning

arXiv.org Artificial Intelligence

We propose Partially Interpretable Estimators (PIE) which attribute a prediction to individual features via an interpretable model, while a (possibly) small part of the PIE prediction is attributed to the interaction of features via a black-box model, with the goal to boost the predictive performance while maintaining interpretability. As such, the interpretable model captures the main contributions of features, and the black-box model attempts to complement the interpretable piece by capturing the "nuances" of feature interactions as a refinement. We design an iterative training algorithm to jointly train the two types of models. Experimental results show that PIE is highly competitive to black-box models while outperforming interpretable baselines. In addition, the understandability of PIE is comparable to simple linear models as validated via a human evaluation.


Infinitesimal gradient boosting

arXiv.org Machine Learning

We define infinitesimal gradient boosting as a limit of the popular tree-based gradient boosting algorithm from machine learning. The limit is considered in the vanishing-learning-rate asymptotic, that is when the learning rate tends to zero and the number of gradient trees is rescaled accordingly. For this purpose, we introduce a new class of randomized regression trees bridging totally randomized trees and Extra Trees and using a softmax distribution for binary splitting. Our main result is the convergence of the associated stochastic algorithm and the characterization of the limiting procedure as the unique solution of a nonlinear ordinary differential equation in a infinite dimensional function space. Infinitesimal gradient boosting defines a smooth path in the space of continuous functions along which the training error decreases, the residuals remain centered and the total variation is well controlled.


Boosted Embeddings for Time Series Forecasting

arXiv.org Artificial Intelligence

Time series forecasting is a fundamental task emerging from diverse data-driven applications. Many advanced autoregressive methods such as ARIMA were used to develop forecasting models. Recently, deep learning based methods such as DeepAr, NeuralProphet, Seq2Seq have been explored for time series forecasting problem. In this paper, we propose a novel time series forecast model, DeepGB. We formulate and implement a variant of Gradient boosting wherein the weak learners are DNNs whose weights are incrementally found in a greedy manner over iterations. In particular, we develop a new embedding architecture that improves the performance of many deep learning models on time series using Gradient boosting variant. We demonstrate that our model outperforms existing comparable state-of-the-art models using real-world sensor data and public dataset.


Individually Fair Gradient Boosting

arXiv.org Machine Learning

We consider the task of enforcing individual fairness in gradient boosting. Gradient boosting is a popular method for machine learning from tabular data, which arise often in applications where algorithmic fairness is a concern. At a high level, our approach is a functional gradient descent on a (distributionally) robust loss function that encodes our intuition of algorithmic fairness for the ML task at hand. Unlike prior approaches to individual fairness that only work with smooth ML models, our approach also works with non-smooth models such as decision trees. We show that our algorithm converges globally and generalizes. We also demonstrate the efficacy of our algorithm on three ML problems susceptible to algorithmic bias.


The Glory of XGBoost

#artificialintelligence

There are so many machine learning algorithms out there, how do you choose the best one for your problem? This question is going to have a different response based on the application and the data. Is it classification, regression, supervised, unsupervised, natural language processing, time series? There are so many avenues to take but in this article I am going to focus on on algorithm that I particularly find very interesting, XGBoost. XGBoost stands for extreme gradient boosting and is an open source library that provides an efficient and effective implementation of gradient boosting.


Dissecting C3.ai's secret sauce: less about AI, more about fixing Hadoop

ZDNet

U.S. patent number 10,824,634, awarded this month. The diagram shows what the company calls a system of integration. A dotted line represents an enclosing wrapper of types that can be referenced to simplify the development of applications that join together resources, such as a data integration unit and a machine learning unit and a MapReduce component. C3.ai, the software company founded by software industry legend Tom Siebel, which on Friday filed to go public, describes its purpose in life as applying artificial intelligence to sales and marketing. What it is actually doing appears to be much more fixing the sins of infrastructure software such as Hadoop, and its commercial implementations by Cloudera and others.


MP-Boost: Minipatch Boosting via Adaptive Feature and Observation Sampling

arXiv.org Machine Learning

Boosting methods are among the best general-purpose and off-the-shelf machine learning approaches, gaining widespread popularity. In this paper, we seek to develop a boosting method that yields comparable accuracy to popular AdaBoost and gradient boosting methods, yet is faster computationally and whose solution is more interpretable. We achieve this by developing MP-Boost, an algorithm loosely based on AdaBoost that learns by adaptively selecting small subsets of instances and features, or what we term minipatches (MP), at each iteration. By sequentially learning on tiny subsets of the data, our approach is computationally faster than other classic boosting algorithms. Also as it progresses, MP-Boost adaptively learns a probability distribution on the features and instances that upweight the most important features and challenging instances, hence adaptively selecting the most relevant minipatches for learning. These learned probability distributions also aid in interpretation of our method. We empirically demonstrate the interpretability, comparative accuracy, and computational time of our approach on a variety of binary classification tasks.


Fast Gradient Boosting with CatBoost

#artificialintelligence

In gradient boosting, predictions are made from an ensemble of weak learners. Unlike a random forest that creates a decision tree for each sample, in gradient boosting, trees are created one after the other. Previous trees in the model are not altered. Results from the previous tree are used to improve the next one. In this piece, we'll take a closer look at a gradient boosting library called CatBoost.


Fast Gradient Boosting with CatBoost - KDnuggets

#artificialintelligence

In gradient boosting, predictions are made from an ensemble of weak learners. Unlike a random forest that creates a decision tree for each sample, in gradient boosting, trees are created one after the other. Previous trees in the model are not altered. Results from the previous tree are used to improve the next one. In this piece, we'll take a closer look at a gradient boosting library called CatBoost.


A Generalized Stacking for Implementing Ensembles of Gradient Boosting Machines

arXiv.org Machine Learning

The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the approach is to use the stacking algorithm in order to learn a second-level meta-model which can be regarded as a model for implementing various ensembles of gradient boosting models. First, the linear regression of the gradient boosting models is considered as a simplest realization of the meta-model under condition that the linear model is differentiable with respect to its coefficients (weights). Then it is shown that the proposed approach can be simply extended on arbitrary differentiable combination models, for example, on neural networks which are differentiable and can implement arbitrary functions of gradient boosting models. Various numerical examples illustrate the proposed approach.