AITopics | yggdrasil

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

Neural Information Processing SystemsMay-27-2025, 19:56:35 GMT

Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets. However, PLANET, the standard distributed tree learning algorithm implemented in systems such as \xgboost and Spark MLlib, scales poorly as data dimensionality and tree depths grow. We present Yggdrasil, a new distributed tree learning method that outperforms existing methods by up to 24x. Unlike PLANET, Yggdrasil is based on vertical partitioning of the data (i.e., partitioning by feature), along with a set of optimized data structures to reduce the CPU and communication costs of training. Yggdrasil (1) trains directly on compressed data for compressible features and labels; (2) introduces efficient data structures for training on uncompressed data; and (3) minimizes communication between nodes by using sparse bitvectors.

artificial intelligence, machine learning, yggdrasil, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)

Add feedback

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

Neural Information Processing SystemsFeb-11-2025, 20:01:35 GMT

Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets. However, PLANET, the standard distributed tree learning algorithm implemented in systems such as \xgboost and Spark MLlib, scales poorly as data dimensionality and tree depths grow. We present Yggdrasil, a new distributed tree learning method that outperforms existing methods by up to 24x. Unlike PLANET, Yggdrasil is based on vertical partitioning of the data (i.e., partitioning by feature), along with a set of optimized data structures to reduce the CPU and communication costs of training. Yggdrasil (1) trains directly on compressed data for compressible features and labels; (2) introduces efficient data structures for training on uncompressed data; and (3) minimizes communication between nodes by using sparse bitvectors.

artificial intelligence, machine learning, yggdrasil, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Reviews: Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

Neural Information Processing SystemsJan-20-2025, 16:46:32 GMT

The paper is well written, and its structure is adapted to the content. Upon reading the paper, one might think that the contribution resides in the vertical splitting of the data over the workers, but the state of the art study presented later on shows that this idea by itself is not new. The novelty comes from associating it with data also distributed vertically, sparse bit vectors for inter-node communications, feature compression with custom data structures and training on compressed data. The paper shows formally and experimentally how the proposed heuristics significantly improve the communication between the nodes and speed up training. The remark that using run-length encoding for the features allows them to hold in the L3 cache, thus decreasing the number of DRAM accesses, doesn't seem to always be true. The paper should explain in which conditions this is true (size of the cache, size of the data, number and type of features, etc.).

optimized system, training deep decision tree, yggdrasil, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

Firas Abuzaid, Joseph K. Bradley, Feynman T. Liang, Andrew Feng, Lee Yang, Matei Zaharia, Ameet S. Talwalkar

Neural Information Processing SystemsJan-20-2025, 16:46:30 GMT

Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets.

artificial intelligence, ggdrasil, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Spain (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

Neural Information Processing SystemsMar-12-2024, 15:28:30 GMT

Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets.

artificial intelligence, ggdrasil, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Spain (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

Abuzaid, Firas, Bradley, Joseph K., Liang, Feynman T., Feng, Andrew, Yang, Lee, Zaharia, Matei, Talwalkar, Ameet S.

Neural Information Processing SystemsDec-31-2016

Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets. However, PLANET, the standard distributed tree learning algorithm implemented in systems such as \xgboost and Spark MLlib, scales poorly as data dimensionality and tree depths grow. We present Yggdrasil, a new distributed tree learning method that outperforms existing methods by up to 24x. Unlike PLANET, Yggdrasil is based on vertical partitioning of the data (i.e., partitioning by feature), along with a set of optimized data structures to reduce the CPU and communication costs of training. Yggdrasil (1) trains directly on compressed data for compressible features and labels; (2) introduces efficient data structures for training on uncompressed data; and (3) minimizes communication between nodes by using sparse bitvectors. Moreover, while PLANET approximates split points through feature binning, Yggdrasil does not require binning, and we analytically characterize the impact of this approximation. We evaluate Yggdrasil against the MNIST 8M dataset and a high-dimensional dataset at Yahoo; for both, Yggdrasil is faster by up to an order of magnitude.

artificial intelligence, decision tree learning, yggdrasil, (17 more...)

Neural Information Processing Systems

Country: