Goto

Collaborating Authors

 Decision Tree Learning


Reviews: Provably robust boosted decision stumps and trees against adversarial attacks

Neural Information Processing Systems

As a main contribution, the authors derive an exact attack algorithm on ensembles of decision stumps for \ell_\infty perturbations. In contrast, this problem is known to be NP-Hard for trees with at least 3 internal nodes, via previous work by Kantchelian et.


Reviews: Provably robust boosted decision stumps and trees against adversarial attacks

Neural Information Processing Systems

Thank you for your submission to NeurIPS. After the author response and discussion, the reviewers and I are in agreement that this work presents an interesting and substantial contribution to the work on provably robust adversarial learning. The extension of such methods from the typical NN setting to one of boosted decision stumps is an interesting one, and certainly worthy of publication. The author response in particular was good at addressing the points of one of the initially most negative reviewer, and it would be good to include these points into the final version.


GPT-HTree: A Decision Tree Framework Integrating Hierarchical Clustering and Large Language Models for Explainable Classification

arXiv.org Artificial Intelligence

Decision trees are fundamental tools in machine learning (ML), prized for their interpretability and simplicity in classification tasks. By providing clear decision paths, they enable users to understand and trust the reasoning behind predictions. However, their effectiveness diminishes when applied to heterogeneous datasets comprising entities with varying characteristics. Uniform decision paths often fail to account for the nuanced differences among diverse segments, leading to oversimplified or misleading classifications. Unsupervised clustering methods, on the other hand, excel in discovering latent structures within complex datasets. These methods, including hierarchical clustering, k-means, and DBSCAN, are powerful tools for segmenting populations into meaningful clusters without requiring predefined labels. While they are effective for uncovering hidden patterns, their primary drawback is a lack of explainability. Clusters produced by unsupervised methods often lack intuitive descriptions or actionable insights, making it difficult to interpret their relevance or apply them in practical decision-making scenarios.


Longitudinal Missing Data Imputation for Predicting Disability Stage of Patients with Multiple Sclerosis

arXiv.org Artificial Intelligence

Multiple Sclerosis (MS) is a chronic disease characterized by progressive or alternate impairment of neurological functions (motor, sensory, visual, and cognitive). Predicting disease progression with a probabilistic and time-dependent approach might help in suggesting interventions that can delay the progression of the disease. However, extracting informative knowledge from irregularly collected longitudinal data is difficult, and missing data pose significant challenges. MS progression is measured through the Expanded Disability Status Scale (EDSS), which quantifies and monitors disability in MS over time. EDSS assesses impairment in eight functional systems (FS). Frequently, only the EDSS score assigned by clinicians is reported, while FS sub-scores are missing. Imputing these scores might be useful, especially to stratify patients according to their phenotype assessed over the disease progression. This study aimed at i) exploring different methodologies for imputing missing FS sub-scores, and ii) predicting the EDSS score using complete clinical data. Results show that Exponential Weighted Moving Average achieved the lowest error rate in the missing data imputation task; furthermore, the combination of Classification and Regression Trees for the imputation and SVM for the prediction task obtained the best accuracy.


Review for NeurIPS paper: A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Neural Information Processing Systems

Clarity: The main paper is mostly written fairly well, the Appendix less so (lots of typos at least). The work nevertheless lacks clarity because several relevant details are moved to the Supplementary part, and some aspects are not mentioned at all (at least in the main paper). The Appendix even contains a section regarding categorical features that is not even hinted at in the main paper. Clarification is needed, e.g., at the following points: - p.2, l.70-73 is too vague, the meaning is unclear - pls. clarify - p.2, l. 85f: clarify what "[...] i enters leaf node l " means (i.e., that data pt. If \hat{y}_i denotes a predicted label, then why is it real-valued and not in [Y]? (Also regarding the description on p.3, l.96f: why should y_i - \hat{y}_i \geq 1 here -- \hat{y}_i is in R, so couldn't it be, say, y_i - delta for some small delta?) - p.3, l.92: perhaps clarify "tree sparsity" -- actually here this means sparsity of the decision hyperplanes, no the tree itself - The 1-norm is used in the MIP (1) and several times in the text later called "linear" (e.g., p.4, l.136), but this is technically incorrect.


Review for NeurIPS paper: A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Neural Information Processing Systems

This paper is about employing advances in computational efficiency of mixed integer programming methods towards decision tree construction problems. While locally optimal methods can achieve an upper bound on the minimization problem efficiently, closing the optimality gap requires tight lower bounds. The authors use an interval relaxation and a support-vector machine procedure to tighten the lower bound. To scale the algorithm, the authors use a LP-based data selection procedure, and perform all experiments using this procedure. It is not clear whether the global optimality properties of the MIP formulation carry through with the data-selection procedure.


Reviews: Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

Neural Information Processing Systems

The paper is well written, and its structure is adapted to the content. Upon reading the paper, one might think that the contribution resides in the vertical splitting of the data over the workers, but the state of the art study presented later on shows that this idea by itself is not new. The novelty comes from associating it with data also distributed vertically, sparse bit vectors for inter-node communications, feature compression with custom data structures and training on compressed data. The paper shows formally and experimentally how the proposed heuristics significantly improve the communication between the nodes and speed up training. The remark that using run-length encoding for the features allows them to hold in the L3 cache, thus decreasing the number of DRAM accesses, doesn't seem to always be true. The paper should explain in which conditions this is true (size of the cache, size of the data, number and type of features, etc.).


Reviews: Pruning Random Forests for Prediction on a Budget

Neural Information Processing Systems

The idea of taking into account feature costs when pruning tree ensembles is original to the best of my knowledge. The main originality of the proposed approach is the fact that it adopts a bottom-up post-pruning strategy, while most existing approaches are top-down, acting during tree growing. While the authors present this feature as an advantage of their method, actually, I'm not convinced that adopting a bottom-up strategy is a good idea for addressing this problem. Since the algorithm indeed can not modify the existing tree structure (it can only prune it), it should be less efficient in terms of feature cost reduction than top-down methods that can have a direct impact on the features selected at tree nodes. For example, let us assume that two very important features in the dataset carry on the exact same information about the output (i.e, they are redundant).


Reviews: A Communication-Efficient Parallel Algorithm for Decision Tree

Neural Information Processing Systems

Given the popularity of decision trees, proposing an efficient parallel implementation of this method is of course very relevant. The proposed parallelization is original with respect to existing methods and it should indeed lead to less communications than other methods. The theoretical analysis is sound and I like the discussion of the impact of the main problem and method parameters that follows from the lower bound provided in theorem 4.1. Experiments are conducted on two very large problems, where, in the limit of the tested settings (see below), PV-tree is clearly shown to outperform other parallel implementations, in terms of both computing times to reach a given accuracy level and communication costs. I nevertheless have two major concerns with the proposed parallelization.


Reviews: Algebraic tests of general Gaussian latent tree models

Neural Information Processing Systems

Paper Summary: The paper presents a technique for testing whether a given set of samples are drawn from a postulated Gaussian latent tree model or a saturated Gaussian graphical model. The paper first characterizes a set of necessary and sufficient constraints that any covariance matrix of a Gaussian latent tree model should satisfy. It then uses these constraints to come up with a test statistic. The paper extends past work on testing for Gaussian latent tree models to settings where the observed variables are allowed to have degree up to 2. The test statistic presented in the paper is based on gaussian approximation for maxima of high dimensional sums. Simulations suggest that the test statistic can potentially work in high dimensional settings.