Coresets for Decision Trees of Signals
–Neural Information Processing Systems
A k -decision tree t (or k -tree) is a recursive partition of a matrix (2D-signal) into k\geq 1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t .Given an error parameter \varepsilon\in(0,1), a (k,\varepsilon) -coreset C of D is a small summarization that provably approximates this loss to \emph{every} such tree, up to a multiplicative factor of 1\pm\varepsilon . In particular, the optimal k -tree of C is a (1 \varepsilon) -approximation to the optimal k -tree of D .We provide the first algorithm that outputs such a (k,\varepsilon) -coreset for \emph{every} such matrix D . The size C of the coreset is polynomial in k\log(N)/\varepsilon, and its construction takes O(Nk) time.This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry. Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x 10, while keeping similar accuracy. Full open source code is provided.
Neural Information Processing Systems
Jan-19-2025, 15:28:26 GMT
- Technology: