model reliance
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > New Finding (0.46)
Supplementary Material: Model Class Reliance for Random Forests
Replication is facilitated through the provision of four hosted Python notebooks which replicate the paper results. When tested hosted runtimes were running Python 3.6.9 The packages developed as part of this work are discussed below and made available via the above notebooks. The code is written as an extension to the sklearn RandomForestRegressor and RandomForestClas-sifer classes. If running the notebooks on a hosted instance this will be automatically installed. The wrapper calls the R code from the lead author's github If running the notebooks on a hosted instance this will be automatically installed.
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.15)
- North America > Canada (0.04)
- Africa > East Africa (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > New Finding (0.46)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.15)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Africa > East Africa (0.04)
The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance
Donnelly, Jon, Katta, Srikar, Rudin, Cynthia, Browne, Edward P.
Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available at https://github.com/jdonnelly36/Rashomon_Importance_Distribution.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.92)
Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models
Variable importance is central to scientific studies, including the social sciences and causal inference, healthcare, and in other domains. However, current notions of variable importance are often tied to a specific predictive model. This is problematic: what if there were multiple well-performing predictive models, and a specific variable is important to some of them and not to others? In that case, we may not be able to tell from a single well-performing model whether a variable is always important in predicting the outcome. Rather than depending on variable importance for a single predictive model, we would like to explore variable importance for all approximately-equally-accurate predictive models. This work introduces the concept of a variable importance cloud, which maps every variable to its importance for every good predictive model. We show properties of the variable importance cloud and draw connections other areas of statistics. We introduce variable importance diagrams as a projection of the variable importance cloud into two dimensions for visualization purposes. Experiments with criminal justice and marketing data illustrate how variables can change dramatically in importance for approximately-equally-accurate predictive models.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Florida > Broward County (0.04)