Handling Missing Data in Decision Trees: A Probabilistic Approach
Khosravi, Pasha, Vergari, Antonio, Choi, YooJung, Liang, Yitao, Broeck, Guy Van den
–arXiv.org Artificial Intelligence
However, most of these are heuristics in nature (Twala et al., 2008), tailored towards some specific tree induction algorithm Decision trees are a popular family of models (Chen & Guestrin, 2016; Prokhorenkova et al., 2018), due to their attractive properties such as interpretability or make strong distributional assumptions about the data, and ability to handle heterogeneous such as the feature distribution factorizing completely (e.g., data. Concurrently, missing data is a prevalent mean, median imputation (Rubin, 1976)) or according to the occurrence that hinders performance of machine tree structure (Quinlan, 1993). As many works have compared learning models. As such, handling missing data the most prominent ones in empirical studies (Batista in decision trees is a well studied problem. In & Monard, 2003; Saar-Tsechansky & Provost, 2007), there this paper, we tackle this problem by taking a is no clear winner and ultimately, the adoption of a particular probabilistic approach. At deployment time, we strategy in practice boils down to its availability in the use tractable density estimators to compute the ML libraries employed. "expected prediction" of our models. At learning time, we fine-tune parameters of already learned In this work, we tackle handling missing data in trees at trees by minimizing their "expected prediction both learning and deployment time from a principled probabilistic loss" w.r.t.
arXiv.org Artificial Intelligence
Jun-29-2020