Goto

Collaborating Authors

 Ensemble Learning


Reviews: Minimal Variance Sampling in Stochastic Gradient Boosting

Neural Information Processing Systems

Update: I read authors' responce RE:sampling rate does not tell the whole story - i was suggesting to add information about on average how many instances were used for each of the splits (because it is not equal to sampling rate * total dataset size). I am keeping my accept rating, hoping that authors do make the changes to improve the derivations/clarity in the final submission Summary: this paper is concerned with a common trick that a lot of GBDT implementation apply - subsampling instances in order to speed up calculations for finding the best split. The authors formulate the problem of choosing the instances to sample as an optimization problem and derive a modified sampling scheme that is aimed at mimicking the gain that would be assigned to a split on all the of the data by using a gain calculated only on a subsampled instances. The experiments demonstrate good results. The paper is well written and easy to follow, apart from a couple of places in derivations(see my questions).


Reviews: Minimal Variance Sampling in Stochastic Gradient Boosting

Neural Information Processing Systems

The authors propose a non-uniform sampling strategy for stochastic gradient boosted decision trees. In particular, sampling probability of the training data is optimized towards maximizing the estimation accuracy of the splitting score of decision trees. The optimization problem allows an approximate closed-form solution. Experiment results demonstrate superior performance of the proposed strategy. The reviewers agree that the paper can not only help understand sampling within GBDT from a more rigorous perspective but also improve GBDT implementations in practice.


coverforest: Conformal Predictions with Random Forest in Python

arXiv.org Machine Learning

Conformal prediction provides a framework for uncertainty quantification, specifically in the forms of prediction intervals and sets with distribution-free guaranteed coverage. While recent cross-conformal techniques such as CV+ and Jackknife+-after-bootstrap achieve better data efficiency than traditional split conformal methods, they incur substantial computational costs due to required pairwise comparisons between training and test samples' out-of-bag scores. Observing that these methods naturally extend from ensemble models, particularly random forests, we leverage existing optimized random forest implementations to enable efficient cross-conformal predictions. We present coverforest, a Python package that implements efficient conformal prediction methods specifically optimized for random forests. coverforest supports both regression and classification tasks through various conformal prediction methods, including split conformal, CV+, Jackknife+-after-bootstrap, and adaptive prediction sets. Our package leverages parallel computing and Cython optimizations to speed up out-of-bag calculations. Our experiments demonstrate that coverforest's predictions achieve the desired level of coverage. In addition, its training and prediction times can be faster than an existing implementation by 2--9 times. The source code for the coverforest is hosted on GitHub at https://github.com/donlapark/coverforest.


Reviews: Regularized Gradient Boosting

Neural Information Processing Systems

Gradient boosting (GB) has been extensively studied in the past, both theoretically and experimentally. Recently, with the advent of big data, several accelerated versions of vanilla GB have been proposed (in particular the well known XGBoost), and while the experimental evaluations of these methods have been abundant, the same cannot be said for the theoretical analysis. In this paper, the authors tackle this important problem. The main contribution of this paper consists in casting the various accelerated GB methods in a regularized gradient boosting setting. Indeed, by introducing a regularization term in the usual minimization objective of GB, it is possible to recover most, if not all, of the various accelerated gradient boosting approaches (XGboost included), while at the same time opening up several interesting and exciting possibilities for deriving new/novel acceleration strategies.


Reviews: Regularized Gradient Boosting

Neural Information Processing Systems

This paper proposes Rademacher generalization bounds for Regularized Gradient Boosting which encompasses various accelerated GB methods. Although there are still some work to be done in order to make the proposed algorithm derived from the theoretical study faster but the proposed theoretical study deserves publication.


A review on development of eco-friendly filters in Nepal for use in cigarettes and masks and Air Pollution Analysis with Machine Learning and SHAP Interpretability

arXiv.org Artificial Intelligence

In Nepal, air pollution is a serious public health concern, especially in cities like Kathmandu where particulate matter (PM2.5 and PM10) has a major influence on respiratory health and air quality. The Air Quality Index (AQI) is predicted in this work using a Random Forest Regressor, and the model's predictions are interpreted using SHAP (SHapley Additive exPlanations) analysis. With the lowest Testing RMSE (0.23) and flawless R2 scores (1.00), CatBoost performs better than other models, demonstrating its greater accuracy and generalization which is cross validated using a nested cross validation approach. NowCast Concentration and Raw Concentration are the most important elements influencing AQI values, according to SHAP research, which shows that the machine learning results are highly accurate. Their significance as major contributors to air pollution is highlighted by the fact that high values of these characteristics significantly raise the AQI. This study investigates the Hydrogen-Alpha (HA) biodegradable filter as a novel way to reduce the related health hazards. With removal efficiency of more than 98% for PM2.5 and 99.24% for PM10, the HA filter offers exceptional defense against dangerous airborne particles. These devices, which are biodegradable face masks and cigarette filters, address the environmental issues associated with traditional filters' non-biodegradable trash while also lowering exposure to air contaminants.


Review for NeurIPS paper: Margins are Insufficient for Explaining Gradient Boosting

Neural Information Processing Systems

Weaknesses: UPDATE: I read the author's reply and I do not agree. In this text, I will focus on the two-class problem, {-1, 1}, for simplicity. First, GB combines regressors, and not classifiers, and their outputs cannot be normalized as classifiers. Second, the training of GB cannot be unlinked from the sigmoid as the pseudo-residuals are computed as the sigmoid times the class (Friedman 1999, section 4.5). In fact the output of the raw function of GB, that is F(x), tends to the log-odds ratio of the two classes.


Review for NeurIPS paper: Margins are Insufficient for Explaining Gradient Boosting

Neural Information Processing Systems

R2 support rejects by mentioning that the results do not directly take into account some specificities of the gradient boosting (GB) learning algorithms in particular the problems of normalization of the regressors that have to be combined. That being said, the theory presented in the paper is fairly general, giving new insights on (gradient) boosting methods, it provides progress on margin bounds in both direction (lower upper bounds) with respect to current state of the art. The wide use of (gradient) boosting methods make the paper interesting for the community. Based on these positive points, I recommend acceptance. However, the authors should consider revising their paper according to the following points: -The theory provided is rather general, not specific to GB, and must presented accordingly.


Reviews: Fast and Flexible Monotonic Functions with Ensembles of Lattices

Neural Information Processing Systems

The paper is heavily based on the previous work on lattice (and monotone lattice) regression, and is thus not self-contained. It is not even explained in the main part of the paper what is the actual function which a lattice represents (two examples are found in the supplementary materials, but I still found the explanation two brief). I think this should definitely be a part of the main paper, otherwise the reader is forced to check the previous work on this topic to understand what kind model is actually being considered. While (7) is convex given calibrators being fixed, it is not clear whether it is convex when jointly optimized over calibration and lattice parameters (I doubt it is). Moreover, (3) seems highly complex and combinatorial optimization criterion.


Reviews: Pruning Random Forests for Prediction on a Budget

Neural Information Processing Systems

The idea of taking into account feature costs when pruning tree ensembles is original to the best of my knowledge. The main originality of the proposed approach is the fact that it adopts a bottom-up post-pruning strategy, while most existing approaches are top-down, acting during tree growing. While the authors present this feature as an advantage of their method, actually, I'm not convinced that adopting a bottom-up strategy is a good idea for addressing this problem. Since the algorithm indeed can not modify the existing tree structure (it can only prune it), it should be less efficient in terms of feature cost reduction than top-down methods that can have a direct impact on the features selected at tree nodes. For example, let us assume that two very important features in the dataset carry on the exact same information about the output (i.e, they are redundant).