Coresets for Decision Trees of Signals

Jan-19-2025, 15:28:26 GMT–Neural Information Processing Systems

A k -decision tree t (or k -tree) is a recursive partition of a matrix (2D-signal) into k\geq 1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t .Given an error parameter \varepsilon\in(0,1), a (k,\varepsilon) -coreset C of D is a small summarization that provably approximates this loss to \emph{every} such tree, up to a multiplicative factor of 1\pm\varepsilon . In particular, the optimal k -tree of C is a (1 \varepsilon) -approximation to the optimal k -tree of D .We provide the first algorithm that outputs such a (k,\varepsilon) -coreset for \emph{every} such matrix D . The size C of the coreset is polynomial in k\log(N)/\varepsilon, and its construction takes O(Nk) time.This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry. Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x 10, while keeping similar accuracy. Full open source code is provided.

coreset, decision tree, varepsilon, (4 more...)

Neural Information Processing Systems

Jan-19-2025, 15:28:26 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Decision Tree Learning (1.00)
  - Representation & Reasoning > Diagnosis (0.88)