Statistical and Computational Guarantees for Influence Diagnostics
Fisher, Jillian, Liu, Lang, Pillutla, Krishna, Choi, Yejin, Harchaoui, Zaid
Statistical machine learning models have been increasingly used in fully or partially automatized data analysis processes and artificial intelligence applications (Rudin, 2019). The automatizing of decisions impacting the society inspire a parallel effort to develop methods to identify the factors impacting specific decisions. The heightened scrutiny on the way statistical models now operate at a large scale and at a fast pace has led to a renewed interest in statistical diagnostics such as the influence function (Cook and Weisberg, 1982; Koh and Liang, 2017; Schioppa et al., 2022; Louvet et al., 2022). The influence function or curve of a statistical estimator has been proposed to measure the sensitivity of the estimator to individual datapoints. Computing the influence of a particular datapoint boils down to computing an inverse-Hessian-vector product. Due to a greater focus on least-squares-type estimator with small samples, the computational aspects have received relatively little attention until recently (Koh and Liang, 2017; Schioppa et al., 2022), while the statistical aspects have mainly focused on large sample classical asymptotics (Rousseeuw et al., 2011; Avella-Medina, 2017). The statistical analysis of influence functions for generalized linear models presents several challenges.
Sep-19-2023
- Country:
- North America > United States (0.28)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Health & Medicine (1.00)
- Technology: