Goto

Collaborating Authors

 Decision Tree Learning


Discovering Drug-Drug and Drug-Disease Interactions Inducing Acute Kidney Injury Using Deep Rule Forests

arXiv.org Machine Learning

Patients with Acute Kidney Injury (AKI) increase mortality, morbidity, and long-term adverse events. Therefore, early identification of AKI may improve renal function recovery, decrease comorbidities, and further improve patients' survival. To control certain risk factors and develop targeted prevention strategies are important to reduce the risk of AKI. Drug-drug interactions and drug-disease interactions are critical issues for AKI. Typical statistical approaches cannot handle the complexity of drug-drug and drug-disease interactions. In this paper, we propose a novel learning algorithm, Deep Rule Forests (DRF), which discovers rules from multilayer tree models as the combinations of drug usages and disease indications to help identify such interactions. We found that several disease and drug usages are considered having significant impact on the occurrence of AKI. Our experimental results also show that the DRF model performs comparatively better than typical tree-based and other state-of-the-art algorithms in terms of prediction accuracy and model interpretability.


Model Distillation for Revenue Optimization: Interpretable Personalized Pricing

arXiv.org Machine Learning

Data-driven pricing strategies are becoming increasingly common, where customers are offered a personalized price based on features that are predictive of their valuation of a product. It is desirable to have this pricing policy be simple and interpretable, so it can be verified, checked for fairness, and easily implemented. However, efforts to incorporate machine learning into a pricing framework often lead to complex pricing policies which are not interpretable, resulting in slow adoption in practice. We present a customized, prescriptive tree-based algorithm that distills knowledge from a complex black box machine learning algorithm, segments customers with similar valuations and prescribes prices in such a way that maximizes revenue while maintaining interpretability. We quantify the regret of a resulting policy and demonstrate its efficacy in applications with both synthetic and real-world datasets.


On Symbolically Encoding the Behavior of Random Forests

arXiv.org Artificial Intelligence

Recent work has shown that the input-output behavior of some machine learning systems can be captured symbolically using Boolean expressions or tractable Boolean circuits, which facilitates reasoning about the behavior of these systems. While most of the focus has been on systems with Boolean inputs and outputs, we address systems with discrete inputs and outputs, including ones with discretized continuous variables as in systems based on decision trees. We also focus on the suitability of encodings for computing prime implicants, which have recently played a central role in explaining the decisions of machine learning systems. We show some key distinctions with encodings for satisfiability, and propose an encoding that is sound and complete for the given task.


Explaining predictive models with mixed features using Shapley values and conditional inference trees

arXiv.org Machine Learning

It is becoming increasingly important to explain complex, black-box machine learning models. Although there is an expanding literature on this topic, Shapley values stand out as a sound method to explain predictions from any type of machine learning model. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. This methodology was then extended to explain dependent features with an underlying continuous distribution. In this paper, we propose a method to explain mixed (i.e. continuous, discrete, ordinal, and categorical) dependent features by modeling the dependence structure of the features using conditional inference trees. We demonstrate our proposed method against the current industry standards in various simulation studies and find that our method often outperforms the other approaches. Finally, we apply our method to a real financial data set used in the 2018 FICO Explainable Machine Learning Challenge and show how our explanations compare to the FICO challenge Recognition Award winning team.


ExKMC: Expanding Explainable $k$-Means Clustering

arXiv.org Machine Learning

Despite the popularity of explainable AI, there is limited work on effective methods for unsupervised learning. We study algorithms for $k$-means clustering, focusing on a trade-off between explainability and accuracy. Following prior work, we use a small decision tree to partition a dataset into $k$ clusters. This enables us to explain each cluster assignment by a short sequence of single-feature thresholds. While larger trees produce more accurate clusterings, they also require more complex explanations. To allow flexibility, we develop a new explainable $k$-means clustering algorithm, ExKMC, that takes an additional parameter $k' \geq k$ and outputs a decision tree with $k'$ leaves. We use a new surrogate cost to efficiently expand the tree and to label the leaves with one of $k$ clusters. We prove that as $k'$ increases, the surrogate cost is non-increasing, and hence, we trade explainability for accuracy. Empirically, we validate that ExKMC produces a low cost clustering, outperforming both standard decision tree methods and other algorithms for explainable clustering. Implementation of ExKMC available at https://github.com/navefr/ExKMC.



Inference in Bayesian Additive Vector Autoregressive Tree Models

arXiv.org Machine Learning

Vector autoregressive (VAR) models assume linearity between the endogenous variables and their lags. This linearity assumption might be overly restrictive and could have a deleterious impact on forecasting accuracy. As a solution, we propose combining VAR with Bayesian additive regression tree (BART) models. The resulting Bayesian additive vector autoregressive tree (BAVART) model is capable of capturing arbitrary non-linear relations between the endogenous variables and the covariates without much input from the researcher. Since controlling for heteroscedasticity is key for producing precise density forecasts, our model allows for stochastic volatility in the errors. Using synthetic and real data, we demonstrate the advantages of our methods. For Eurozone data, we show that our nonparametric approach improves upon commonly used forecasting models and that it produces impulse responses to an uncertainty shock that are consistent with established findings in the literature.


Handling Missing Data in Decision Trees: A Probabilistic Approach

arXiv.org Artificial Intelligence

However, most of these are heuristics in nature (Twala et al., 2008), tailored towards some specific tree induction algorithm Decision trees are a popular family of models (Chen & Guestrin, 2016; Prokhorenkova et al., 2018), due to their attractive properties such as interpretability or make strong distributional assumptions about the data, and ability to handle heterogeneous such as the feature distribution factorizing completely (e.g., data. Concurrently, missing data is a prevalent mean, median imputation (Rubin, 1976)) or according to the occurrence that hinders performance of machine tree structure (Quinlan, 1993). As many works have compared learning models. As such, handling missing data the most prominent ones in empirical studies (Batista in decision trees is a well studied problem. In & Monard, 2003; Saar-Tsechansky & Provost, 2007), there this paper, we tackle this problem by taking a is no clear winner and ultimately, the adoption of a particular probabilistic approach. At deployment time, we strategy in practice boils down to its availability in the use tractable density estimators to compute the ML libraries employed. "expected prediction" of our models. At learning time, we fine-tune parameters of already learned In this work, we tackle handling missing data in trees at trees by minimizing their "expected prediction both learning and deployment time from a principled probabilistic loss" w.r.t.


Reducing Risk of Model Inversion Using Privacy-Guided Training

arXiv.org Machine Learning

Machine learning models often pose a threat to the privacy of individuals whose data is part of the training set. Several recent attacks have been able to infer sensitive information from trained models, including model inversion or attribute inference attacks. These attacks are able to reveal the values of certain sensitive features of individuals who participated in training the model. It has also been shown that several factors can contribute to an increased risk of model inversion, including feature influence. We observe that not all features necessarily share the same level of privacy or sensitivity. In many cases, certain features used to train a model are considered especially sensitive and therefore propitious candidates for inversion. We present a solution for countering model inversion attacks in tree-based models, by reducing the influence of sensitive features in these models. This is an avenue that has not yet been thoroughly investigated, with only very nascent previous attempts at using this as a countermeasure against attribute inference. Our work shows that, in many cases, it is possible to train a model in different ways, resulting in different influence levels of the various features, without necessarily harming the model's accuracy. We are able to utilize this fact to train models in a manner that reduces the model's reliance on the most sensitive features, while increasing the importance of less sensitive features. Our evaluation confirms that training models in this manner reduces the risk of inference for those features, as demonstrated through several black-box and white-box attacks.


Do Decision Trees need Feature Scaling?

#artificialintelligence

Machine Learning algorithms have always been on the path towards evolution since its inception. Today the domain has come a long way from mathematical modelling to ensemble modelling and more. This evolution has seen more robust and SOTA models which is almost bridging the gap between potentials capabilities of human and AI. Ensemble modelling has given us one of those SOTA model XGBoost. Recently I happened to participate in a Machine Learning Hiring Challenge where the problem statement was a classification problem.