Goto

Collaborating Authors

 Decision Tree Learning


Harnessing the power of choices in decision tree learning

Neural Information Processing Systems

We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. These algorithms, which have been central to machine learning for decades, are greedy in nature: they grow a decision tree by iteratively splitting on the best attribute. Our algorithm, Top- k, considers the k best attributes as possible splits instead of just the single best attribute. We demonstrate, theoretically and empirically, the power of this simple generalization. We first prove a greediness hierarchy theorem showing that for every k\in \mathbb{N}, Top- (k 1) can be dramatically more powerful than Top- k: there are data distributions for which the former achieves accuracy 1-\epsilon, whereas the latter only achieves accuracy \frac{1}{2} \epsilon .


Feature Learning for Interpretable, Performant Decision Trees

Neural Information Processing Systems

Decision trees are regarded for high interpretability arising from their hierarchical partitioning structure built on simple decision rules. However, in practice, this is not realized because axis-aligned partitioning of realistic data results in deep trees, and because ensemble methods are used to mitigate overfitting. Even then, model complexity and performance remain sensitive to transformation of the input, and extensive expert crafting of features from the raw data is common. We propose the first system to alternate sparse feature learning with differentiable decision tree construction to produce small, interpretable trees with good performance. We benchmark against conventional tree-based models and demonstrate several notions of interpretation of a model and its predictions.


SSL4EO-L: Datasets and Foundation Models for Landsat Imagery

Neural Information Processing Systems

The Landsat program is the longest-running Earth observation program in history, with 50 years of data acquisition by 8 satellites. The multispectral imagery captured by sensors onboard these satellites is critical for a wide range of scientific fields. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests for Landsat image analysis due to the prevalence of small labeled datasets and lack of foundation models. In this paper, we introduce SSL4EO-L, the first ever dataset designed for Self-Supervised Learning for Earth Observation for the Landsat family of satellites (including 3 sensors and 2 product levels) and the largest Landsat dataset in history (5M image patches). Additionally, we modernize and re-release the L7 Irish and L8 Biome cloud detection datasets, and introduce the first ML benchmark datasets for Landsats 4–5 TM and Landsat 7 ETM SR.


Coresets for Decision Trees of Signals

Neural Information Processing Systems

A k -decision tree t (or k -tree) is a recursive partition of a matrix (2D-signal) into k\geq 1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t .Given an error parameter \varepsilon\in(0,1), a (k,\varepsilon) -coreset C of D is a small summarization that provably approximates this loss to \emph{every} such tree, up to a multiplicative factor of 1\pm\varepsilon . In particular, the optimal k -tree of C is a (1 \varepsilon) -approximation to the optimal k -tree of D .We provide the first algorithm that outputs such a (k,\varepsilon) -coreset for \emph{every} such matrix D . The size C of the coreset is polynomial in k\log(N)/\varepsilon, and its construction takes O(Nk) time.This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry. Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x 10, while keeping similar accuracy. Full open source code is provided.


VaRT: Variational Regression Trees

Neural Information Processing Systems

Decision trees are a well-established tool in machine learning for classification and regression tasks. In this paper, we introduce a novel non-parametric Bayesian model that uses variational inference to approximate a posterior distribution over the space of stochastic decision trees. We evaluate the model's performance on 18 datasets and demonstrate its competitiveness with other state-of-the-art methods in regression tasks. We also explore its application to causal inference problems. We provide a fully vectorized implementation of our algorithm in PyTorch.


On the Gini-impurity Preservation For Privacy Random Forests

Neural Information Processing Systems

Random forests have been one successful ensemble algorithms in machine learning. Various techniques have been utilized to preserve the privacy of random forests from anonymization, differential privacy, homomorphic encryption, etc., whereas it rarely takes into account some crucial ingredients of learning algorithm. This work presents a new encryption to preserve data's Gini impurity, which plays a crucial role during the construction of random forests. Our basic idea is to modify the structure of binary search tree to store several examples in each node, and encrypt data features by incorporating label and order information. Theoretically, we prove that our scheme preserves the minimum Gini impurity in ciphertexts without decrypting, and present the security guarantee for encryption.


Decision Tree for Locally Private Estimation with Public Data

Neural Information Processing Systems

We propose conducting locally differentially private (LDP) estimation with the aid of a small amount of public data to enhance the performance of private estimation. Specifically, we introduce an efficient algorithm called Locally differentially Private Decision Tree (LPDT) for LDP regression. We first use the public data to grow a decision tree partition and then fit an estimator according to the partition privately. From a theoretical perspective, we show that LPDT is \varepsilon -LDP and has a mini-max optimal convergence rate under a mild assumption of similarity between public and private data, whereas the lower bound of the convergence rate of LPDT without public data is strictly slower, which implies that the public data helps to improve the convergence rates of LDP estimation. We conduct experiments on both synthetic and real-world data to demonstrate the superior performance of LPDT compared with other state-of-the-art LDP regression methods. Moreover, we show that LPDT remains effective despite considerable disparities between public and private data.


Certifying Robustness to Programmable Data Bias in Decision Trees

Neural Information Processing Systems

Datasets can be biased due to societal inequities, human biases, under-representation of minorities, etc. Our goal is to certify that models produced by a learning algorithm are pointwise-robust to dataset biases. This is a challenging problem: it entails learning models for a large, or even infinite, number of datasets, ensuring that they all produce the same prediction. We focus on decision-tree learning due to the interpretable nature of the models. Our approach allows programmatically specifying \emph{bias models} across a variety of dimensions (e.g., label-flipping or missing data), composing types of bias, and targeting bias towards a specific group.


Lattice partition recovery with dyadic CART

Neural Information Processing Systems

We study piece-wise constant signals corrupted by additive Gaussian noise over a d -dimensional lattice. Data of this form naturally arise in a host of applications, and the tasks of signal detection or testing, de-noising and estimation have been studied extensively in the statistical and signal processing literature. In this paper we consider instead the problem of partition recovery, i.e. of estimating the partition of the lattice induced by the constancy regions of the unknown signal, using the computationally-efficient dyadic classification and regression tree (DCART) methodology proposed by \citep{donoho1997cart}. We prove that, under appropriate regularity conditions on the shape of the partition elements, a DCART-based procedure consistently estimates the underlying partition at a rate of order \sigma 2 k * \log (N)/\kappa 2, where k * is the minimal number of rectangular sub-graphs obtained using recursive dyadic partitions supporting the signal partition, \sigma 2 is the noise variance, \kappa is the minimal magnitude of the signal difference among contiguous elements of the partition and N is the size of the lattice. Furthermore, under stronger assumptions, our method attains a sharper estimation error of order \sigma 2\log(N)/\kappa 2, independent of k *, which we show to be minimax rate optimal.


Fair and Optimal Decision Trees: A Dynamic Programming Approach

Neural Information Processing Systems

Interpretable and fair machine learning models are required for many applications, such as credit assessment and in criminal justice. Decision trees offer this interpretability, especially when they are small. Optimal decision trees are of particular interest because they offer the best performance possible for a given size. However, state-of-the-art algorithms for fair and optimal decision trees have scalability issues, often requiring several hours to find such trees even for small datasets. Previous research has shown that dynamic programming (DP) performs well for optimizing decision trees because it can exploit the tree structure.