Handling Missing Data in Decision Trees: A Probabilistic Approach

Khosravi, Pasha, Vergari, Antonio, Choi, YooJung, Liang, Yitao, Broeck, Guy Van den

Jun-29-2020–arXiv.org Artificial Intelligence

However, most of these are heuristics in nature (Twala et al., 2008), tailored towards some specific tree induction algorithm Decision trees are a popular family of models (Chen & Guestrin, 2016; Prokhorenkova et al., 2018), due to their attractive properties such as interpretability or make strong distributional assumptions about the data, and ability to handle heterogeneous such as the feature distribution factorizing completely (e.g., data. Concurrently, missing data is a prevalent mean, median imputation (Rubin, 1976)) or according to the occurrence that hinders performance of machine tree structure (Quinlan, 1993). As many works have compared learning models. As such, handling missing data the most prominent ones in empirical studies (Batista in decision trees is a well studied problem. In & Monard, 2003; Saar-Tsechansky & Provost, 2007), there this paper, we tackle this problem by taking a is no clear winner and ultimately, the adoption of a particular probabilistic approach. At deployment time, we strategy in practice boils down to its availability in the use tractable density estimators to compute the ML libraries employed. "expected prediction" of our models. At learning time, we fine-tune parameters of already learned In this work, we tackle handling missing data in trees at trees by minimizing their "expected prediction both learning and deployment time from a principled probabilistic loss" w.r.t.

artificial intelligence, decision tree learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

Jun-29-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Decision Tree Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found