Diffusion Boosted Trees
A series of pivotal works in recent years (Song and Ermon, 2019; Ho et al., 2020; Song et al., 2021; Dhariwal and Nichol, 2021; Rombach et al., 2022; Karras et al., 2022) has propelled diffusion-based generative models (Sohl-Dickstein et al., 2015) to the forefront of generative AI, capturing a significant amount of academic and industrial interest by the success of this class of models in content generation. Meanwhile, another line of work, Classification and Regression Diffusion Models (CARD) (Han et al., 2022), has been proposed to tackle supervised learning problems with a denoising diffusion probabilistic modeling framework, shedding new lights on both the foundational machine learning paradigm and the new elite in the generative AI family. More specifically, CARD learns the target conditional distribution of the response variable y given the covariates x, p(y | x), without imposing explicit parametric assumptions on its probability density function, and makes predictions by utilizing the stochastic nature of its output to directly generate samples that resemble y from this target distribution. This framework has demonstrated outstanding results on both regression and image classification tasks: in regression, it shows the capability of modeling conditional distributions with flexible statistical attributes, and achieves state-of-the-art metrics on real-world datasets; for image classification, it introduces a novel paradigm to evaluate instance-level prediction confidence besides improving the prediction accuracy by a deterministic classifier. However, CARD models are parameterized by deep neural networks. The work of Grinsztajn et al. (2022) has illustrated that tree-based models remain the state-of-the-art function choice for modeling tabular data, and could outperform neural networks by a wide margin. Tabular data is a crucial type of dataset for many supervised learning tasks, characterized by its table-format structure similar to a spreadsheet or a relational database, where each row represents an individual record or observation, and each column represents a feature or attribute of that record.
Jun-3-2024
- Country:
- North America > United States
- Texas > Travis County > Austin (0.14)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- United Kingdom > England
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (0.46)
- Technology: