Fast, Distribution-free Predictive Inference for Neural Networks with Coverage Guarantees
Gao, Yue, Raskutti, Garvesh, Willet, Rebecca
–arXiv.org Artificial Intelligence
To assess the accuracy of parameter estimates or predictions without specific distributional knowledge of the data, the idea of re-sampling or sub-sampling on the available data has been long-established to construct prediction intervals, and there is a rich history in the statistics literature on the jackknife and bootstrap methods, see Stine (1985), Efron (1979), Quenouille (1949), Efron and Gong (1983). Among these re-sampling methods, leave-one-out methods (generally referred to as "cross-validation" or "jackknife") are widely used to assess or calibrate predictive accuracy, and can be found in a large line of literature (Stone, 1974, Geisser, 1975). While it has been demonstrated in a large body of past work with extensive evidence that jackknifetype methods have reliable empirical performance, the theoretical properties of these types of methods are studied relatively little until recently, see Steinberger and Leeb (2018), Bousquet and Elisseeff (2002). One of the most important results among these theoretically guaranteed works is Foygel Barber et al. (2019), which introduces a crucial modification compared to the traditional jackknife method that permits rigorous coverage guarantees of at least 1 2α regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. We will revisit this work and give more relative details in Section 2.1. Although theoretically jackknife+ has been proven to have coverage guarantees without distributional assumptions, in practice, this method is computationally costly, since we need to train n (which is the training sample size) leave-one-out models from scratch to find the predictive interval. Especially for large and complicated models like neural networks, this computational cost is prohibitive. The goal of this paper is to provide a fast algorithm that provides similar theoretical coverage guarantees to those in jackknife+. To achieve this goal, we develop a new procedure, called Differentially Private Lazy Predictive Inference (DP-Lazy PI), which combines two ideas: lazy training of neural networks and differentially private stochcastic gradient descent (DP-SGD).
arXiv.org Artificial Intelligence
Jun-11-2023
- Country:
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- Genre:
- Research Report (0.82)
- Industry:
- Health & Medicine (1.00)
- Technology: