Interpretable Machine Learning for Discovery: Statistical Challenges \& Opportunities
Allen, Genevera I., Gan, Luqin, Zheng, Lili
–arXiv.org Artificial Intelligence
Machine learning systems have gained widespread use in science, technology, and society. Given the increasing number of high-stakes machine learning applications and the growing complexity of machine learning models, many have advocated for interpretability and explainability to promote understanding and trust in machine learning results (Rasheed et al., 2022, Toreini et al., 2020, Broderick et al., 2023). In response, there has been a recent explosion of research on Interpretable Machine Learning (IML), mostly focusing on new techniques to interpret black-box systems; see Molnar (2022), Lipton (2018), Guidotti et al. (2018), Doshi-Velez & Kim (2017), Du et al. (2019), Murdoch et al. (2019), Carvalho et al. (2019) for recent reviews of the IML and explainable artificial intelligence literature. While most of these interpretability techniques were not necessarily designed for this purpose, they are increasingly being used to mine large and complex data sets to generate new insights (Roscher et al., 2020). These so-called data-driven discoveries are especially important to advance data-rich fields in science, technology, and medicine. While prior reviews focus mainly on IML techniques, we primarily review how IML methods promote data-driven discoveries, challenges associated with this task, and related new research opportunities at the intersection of machine learning and statistics. In the sciences and beyond, IML techniques are routinely employed to make new discoveries from large and complex data sets; to motivate our review on this topic, we highlight several examples. First, feature importance and feature selection in supervised learning are popular forms of interpretation that have led to major discoveries like discovering new genomic biomarkers of diseases (Guyon et al., 2002), discovering physical laws governing dynamical systems (Brunton et al., 2016), and discovering lesions and other abnormalities in radiology (Borjali et al., 2020, Reyes et al., 2020). While most of the IML literature focuses on supervised learning (Molnar, 2022, Lipton, 2018, Guidotti et al., 2018, Doshi-Velez & Kim, 2017), there have been many major scientific discoveries made via unsupervised techniques and we argue that these approaches
arXiv.org Artificial Intelligence
Aug-2-2023
- Country:
- North America
- Canada > Alberta
- Census Division No. 5
- Kneehill County (0.24)
- Starland County (0.24)
- Census Division No. 7 > Stettler County No. 6 (0.24)
- Census Division No. 8 > Red Deer County (0.24)
- Census Division No. 5
- United States (0.94)
- Canada > Alberta
- North America
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.48)
- Industry:
- Technology: