Goto

Collaborating Authors

 Ensemble Learning


Minimal Variance Sampling in Stochastic Gradient Boosting

Neural Information Processing Systems

Stochastic Gradient Boosting (SGB) is a widely used approach to regularization of boosting models based on decision trees. It was shown that, in many cases, random sampling at each iteration can lead to better generalization performance of the model and can also decrease the learning time. Different sampling approaches were proposed, where probabilities are not uniform, and it is not currently clear which approach is the most effective. In this paper, we formulate the problem of randomization in SGB in terms of optimization of sampling probabilities to maximize the estimation accuracy of split scoring used to train decision trees.This optimization problem has a closed-form nearly optimal solution, and it leads to a new sampling technique, which we call Minimal Variance Sampling (MVS).The method both decreases the number of examples needed for each iteration of boosting and increases the quality of the model significantly as compared to the state-of-the art sampling methods. The superiority of the algorithm was confirmed by introducing MVS as a new default option for subsampling in CatBoost, a gradient boosting library achieving state-of-the-art quality on various machine learning tasks.


A Generative AI Technique for Synthesizing a Digital Twin for U.S. Residential Solar Adoption and Generation

arXiv.org Artificial Intelligence

Residential rooftop solar adoption is considered crucial for reducing carbon emissions. The lack of photovoltaic (PV) data at a finer resolution (e.g., household, hourly levels) poses a significant roadblock to informed decision-making. We discuss a novel methodology to generate a highly granular, residential-scale realistic dataset for rooftop solar adoption across the contiguous United States. The data-driven methodology consists of: (i) integrated machine learning models to identify PV adopters, (ii) methods to augment the data using explainable AI techniques to glean insights about key features and their interactions, and (iii) methods to generate household-level hourly solar energy output using an analytical model. The resulting synthetic datasets are validated using real-world data and can serve as a digital twin for modeling downstream tasks. Finally, a policy-based case study utilizing the digital twin for Virginia demonstrated increased rooftop solar adoption with the 30\% Federal Solar Investment Tax Credit, especially in Low-to-Moderate-Income communities.


Regularized Gradient Boosting

Neural Information Processing Systems

Gradient Boosting (\GB) is a popular and very successful ensemble method for binary trees. While various types of regularization of the base predictors are used with this algorithm, the theory that connects such regularizations with generalization guarantees is poorly understood. We fill this gap by deriving data-dependent learning guarantees for \GB\ used with \emph{regularization}, expressed in terms of the Rademacher complexities of the constrained families of base predictors. We introduce a new algorithm, called \rgb\, that directly benefits from these generalization bounds and that, at every boosting round, applies the \emph{Structural Risk Minimization} principle to search for a base predictor with the best empirical fit versus complexity trade-off. Inspired by \emph{Randomized Coordinate Descent} we provide a scalable implementation of our algorithm, able to search over large families of base predictors.


From global to local MDI variable importances for random forests and when they are Shapley values

Neural Information Processing Systems

Random forests have been widely used for their ability to provide so-called importance measures, which give insight at a global (per dataset) level on the relevance of input variables to predict a certain output. On the other hand, methods based on Shapley values have been introduced to refine the analysis of feature relevance in tree-based models to a local (per instance) level. In this context, we first show that the global Mean Decrease of Impurity (MDI) variable importance scores correspond to Shapley values under some conditions. Then, we derive a local MDI importance measure of variable relevance, which has a very natural connection with the global MDI measure and can be related to a new notion of local feature relevance. We further link local MDI importances with Shapley values and discuss them in the light of related measures from the literature. The measures are illustrated through experiments on several classification and regression problems.


Margins are Insufficient for Explaining Gradient Boosting

Neural Information Processing Systems

Boosting is one of the most successful ideas in machine learning, achieving great practical performance with little fine-tuning. The success of boosted classifiers is most often attributed to improvements in margins. The focus on margin explanations was pioneered in the seminal work by Schaphire et al. (1998) and has culminated in the k'th margin generalization bound by Gao and Zhou (2013), which was recently proved to be near-tight for some data distributions (Gr\o nlund et al. 2019). In this work, we first demonstrate that the k'th margin bound is inadequate in explaining the performance of state-of-the-art gradient boosters. We then explain the short comings of the k'th margin bound and prove a stronger and more refined margin-based generalization bound that indeed succeeds in explaining the performance of modern gradient boosters.


MEMS Gyroscope Multi-Feature Calibration Using Machine Learning Technique

arXiv.org Artificial Intelligence

Gyroscopes are crucial for accurate angular velocity measurements in navigation, stabilization, and control systems. MEMS gyroscopes offer advantages like compact size and low cost but suffer from errors and inaccuracies that are complex and time varying. This study leverages machine learning (ML) and uses multiple signals of the MEMS resonator gyroscope to improve its calibration. XGBoost, known for its high predictive accuracy and ability to handle complex, non-linear relationships, and MLP, recognized for its capability to model intricate patterns through multiple layers and hidden dimensions, are employed to enhance the calibration process. Our findings show that both XGBoost and MLP models significantly reduce noise and enhance accuracy and stability, outperforming the traditional calibration techniques. Despite higher computational costs, DL models are ideal for high-stakes applications, while ML models are efficient for consumer electronics and environmental monitoring. Both ML and DL models demonstrate the potential of advanced calibration techniques in enhancing MEMS gyroscope performance and calibration efficiency.


Reviews: LightGBM: A Highly Efficient Gradient Boosting Decision Tree

Neural Information Processing Systems

The paper presents two nice ways for improving the usual gradient boosting algorithm where weak classifiers are decision trees. It is a paper oriented towards efficient (less costful) implementation of the usual algorithm in order to speed up the learning of decision trees by taking into account previous computations and sparse data. The approaches are interesting and smart. A risk bound is given for one of the improvements (GOSS), which seems sound but still quite loose: according to the experiments, a tighter bound could be obtained, getting rid of the "max" sizes of considered sets. No garantee is given for the second improvement (EFB) although is seems to be quite efficient in practice.


Reviews: Cost efficient gradient boosting

Neural Information Processing Systems

Thus, the paper is similar to the work of Xu et al., 2012. The main differences are the fact that the feature and evaluation costs are input-specific, the evaluation cost depends on the number of tree splits, their optimization approach is different (based on the Taylor expansion around T_{k-1}, as described in the XGBoost paper), and they use best-first growth to grow the trees to a maximum number of splits (instead of a max depth). The authors point out that their setup works either in the case where feature cost dominates or evaluation cost dominates and they show experimental results for these settings.


Reviews: Multi-Layered Gradient Boosting Decision Trees

Neural Information Processing Systems

Short overview: Authors propose to build a neural network using gradient boosted trees as components in the layers. To train such a structure, since the gbdts are not able to propagate the gradient, they propose to use a method inspired by the target propagation: each gradient boosted trees is built to approximate a gradient of loss of prediction function and a pseudo target, with respect to the prediction function. Pseudo targets are updated at each iteration using the reverse mapping of the built tree representation and the pseudo label of the next layer. The reverse mapping can be found using the reconstruction loss. At each iteration, each layer's ensemble grows by one boosting tree Authors hint at potential applications of blocking adversarial attacks, that rely on estimating the gradients of the final loss with respect to input, which would not work for layers that can't propagate the gradients, however this direction is not explored in this paper Detailed comments: Overall, an interesting idea of co-training gbdts with nns.


Reviews: CatBoost: unbiased boosting with categorical features

Neural Information Processing Systems

UPDATE AFTER AUTHORS' RESPONSE Regarding "using one tree structure", I think I understand now, and I think the current wording is confusing. Both the manuscript and the response made me think that the *same* tree splits (internal nodes) are used for all of the boosting iterations. But looking at the argmin line in Algorithm 2, I think the intent is to say "the same feature is used to split all internal nodes at a given level of a tree" (aka, oblivious tree). If that is not right, then I am still confused. Regarding one random permutation, please update text to be more clear.