Goto

Collaborating Authors

 Ensemble Learning


Mondrian Forests: Efficient Online Random Forests

Neural Information Processing Systems

Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excellent candidates for real-world prediction tasks. The most popular random forest variants (such as Breiman's random forest and extremely randomized trees) operate on batches of training data. Online methods are now in greater demand. Existing online random forests, however, require more training data than their batch counterpart to achieve comparable predictive performance. In this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles of random decision trees we call Mondrian forests. Mondrian forests can be grown in an incremental/online fashion and remarkably, the distribution of online Mondrian forests is the same as that of batch Mondrian forests. Mondrian forests achieve competitive predictive performance comparable with existing online random forests and periodically retrained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff.


Review for NeurIPS paper: Model Class Reliance for Random Forests

Neural Information Processing Systems

This is a relevant and timely paper that has been reviewed by four knowledgeable referees, who also thoroughly considered the author's response to their initial reviews. Three of these reviewers recommend acceptance, providing detailed suggestions on how to improve this work before its final submission. This dissenting opinion was upheld by R3 after discussion with other referees. R3 in my opinion correctly brings up that if the proposed approach aims to improve runtime with an approximate algorithm, this must be sufficiently demonstrated in experiments vs. straightforward alternatives (such as retraining-based methods). That has not been done in the original submission neither in the rebuttal.


Review for NeurIPS paper: Gradient Boosted Normalizing Flows

Neural Information Processing Systems

Weaknesses: No increase in theoretical flow expressivity: Unlike traditional boosting in which an ensemble of weak learners is provably more expressive, the paper doesn't provide such a proof for the proposed NF boosting procedure. Moreover, I conjecture that this methodology (in the general case, under NN / polynomial universal approximation assumptions) *cannot* build an ensemble that is more expressive than a single constituent component. There are two bottlenecks in NF expressivity---the base distribution and the class of transformation function [Papamakarios et al., 2019]---and the proposed method does not fundamentally change either of these. For example, the base distribution is simple and shared across all components (line 99). Recent work that does improve flow expressivity must use mixture formulations [Papamakarios et al., 2019] (discrete [Dinh et al., 2019] or continuous [Cornish et al., 2020] indices) whose base distribution (or support) and transformation change according to the index.


Review for NeurIPS paper: Gradient Boosted Normalizing Flows

Neural Information Processing Systems

The paper describes a way to create mixtures of normalizing-flow models using gradient boosting. Combining several simple flow models is an alternative to increasing the capacity of a single model, and is worth exploring. One of the main concerns the reviewers expressed is that of limited novelty, in that the proposed method is largely an application and continuation of existing techniques. However, the reviewers agree that the paper is well written, well executed, that although the idea is incremental there are still things to be said about applying gradient boosting to flows, and that the experiments are well done. For these reason, I'm happy to recommend acceptance of the paper.


Review for NeurIPS paper: Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Neural Information Processing Systems

Additional Feedback: I read the author feedback. It answers my question well and was consistent with what I assumed in the original review. Therefore, I remain my positive evaluation. In my understanding, in standard multi-scale GNN, there are nonlinear activation in-between aggregation functions G. In this paper, there is no nonlinear activation in-between aggregation functions G. Nonlinear activation is only in B. Therefore, "graph" part G is always linear. Is there such multi-scale GNN in the literature?


Review for NeurIPS paper: Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Neural Information Processing Systems

The paper considers multi-scale GNNs which have been shown to address over-smoothing issues with standard GNNs, and establishes optimization and generalization guarantees from the perspective of gradient boosting. The paper also suggests GB-GNN with linear transformations, and illustrates that the model can be competitive with the state-of-the-art. Most reviewers felt that the work presents a unique perspective to the performance of multi-scale GNNs. There are some concerns regarding the work - the technical results follow from assumptions and existing results on transductive learning, so there is limited core technical novelty. The work analyzes linear transformations which is different from nonlinear transformations often used in practice.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

The authors study the online setting for boosting in the context of regression problems. Specifically, they describe and analyze two algorithms for online boosting for regression: (1) a boosting algorithm that uses a linear span of the base learning functions as the prediction function (i.e., the standard case) and (2) a boosting algorithm that uses a convex hull (CH) of the base functions as the prediction function. Algorithm (1) more closely aligns with existing gradient boosting approaches and provides the most practical insight. Algorithm (2) has some nice theoretical properties with respect to being optimal for the specified setting (and may give some insight to optimality in the span(F) case of algorithm (1). Experiments are also performed on 14 standard datasets and show that the proposed approaches outperform the base learners on average (and nearly universally when looking at the supplementary material).


Smart IoT Security: Lightweight Machine Learning Techniques for Multi-Class Attack Detection in IoT Networks

arXiv.org Artificial Intelligence

In the growing terrain of the Internet of Things (IoT), it is vital that networks are secure to protect against a range of cyber threats. Based on the strong machine learning framework, this study proposes novel lightweight ensemble approaches for improving multi-class attack detection of IoT devices. Using the large CICIoT 2023 dataset with 34 attack types distributed amongst 10 attack categories, we systematically evaluated the performance of a wide variety of modern machine learning methods with the aim of establishing the best-performing algorithmic choice to secure IoT applications. In particular, we explore approaches based on ML classifiers to tackle the biocharges characterized by the challenging and heterogeneous nature of attack vectors in IoT environments. The method that performed best was the Decision Tree, with an accuracy of 99.56% and an F1 score of 99.62%, showing that this model is capable of accurately and reliably detecting threats.The Random Forest model was the next best-performing model with 98.22% and an F1 score of 98.24%, suggesting that ML methods are quite effective in a situation of high-dimensional data. Our results highlight the potential for using ML classifiers in bolstering security for IoT devices and also serve as motivations for future investigations targeting scalable, keystroke-based attack detection systems. We believe that our method provides a new path to develop complex machine learning algorithms for low-resource IoT devices, balancing both accuracy and time efficiency needs. In summary, these contributions enrich the state of the art of the IoT security literature, laying down solid ground and guidelines for the deployment of smart, adaptive security in IoT settings.


TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems

arXiv.org Artificial Intelligence

TabPFN has emerged as a promising in-context learning model for tabular data, capable of directly predicting the labels of test samples given labeled training examples. It has demonstrated competitive performance, particularly on small-scale classification tasks. However, despite its effectiveness, TabPFN still requires further refinement in several areas, including handling high-dimensional features, aligning with downstream datasets, and scaling to larger datasets. In this paper, we revisit existing variants of TabPFN and observe that most approaches focus either on reducing bias or variance, often neglecting the need to address the other side, while also increasing inference overhead. To fill this gap, we propose Beta (Bagging and Encoder-based Fine-tuning for TabPFN Adaptation), a novel and effective method designed to minimize both bias and variance. To reduce bias, we introduce a lightweight encoder to better align downstream tasks with the pre-trained TabPFN. By increasing the number of encoders in a lightweight manner, Beta mitigate variance, thereby further improving the model's performance. Additionally, bootstrapped sampling is employed to further reduce the impact of data perturbations on the model, all while maintaining computational efficiency during inference. Our approach enhances TabPFN's ability to handle high-dimensional data and scale to larger datasets. Experimental results on over 200 benchmark classification datasets demonstrate that Beta either outperforms or matches state-of-the-art methods.


Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data

arXiv.org Machine Learning

Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To the best of our knowledge, this is the first work that considers an in-place unified incremental and decremental learning on GBDT. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. We theoretically show the relationship between the hyper-parameters of the proposed optimizations, which enables trading off accuracy and cost on incremental and decremental learning. The backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model using incremental and decremental learning, and the empirical results on public datasets confirm the effectiveness and efficiency of our proposed online learning framework and optimizations.