Can we trust the bootstrap in high-dimension?

Karoui, Noureddine El, Purdom, Elizabeth

arXiv.org Machine Learning 

The bootstrap [15] is a ubiquitous tool in applied statistics, allowing for inference when very little is known about the properties of the data-generating distribution. The bootstrap is a powerful tool in applied settings because it does not make the strong assumptions common to classical statistical theory regarding this data-generating distribution. Instead, the bootstrap resamples the observed data to create an estimate, ˆF, of the unknown data-generating distribution, F. ˆF then forms the basis of further inference. Since its introduction, a large amount of research has explored the theoretical properties of the bootstrap, improvements for estimating F under different scenarios, and how to most effectively estimate different quantities from ˆF (see the pioneering [6] for instance and many many more references in the book-length review of [8], as well as [61] for a short summary of the modern point of view on these questions). Other resampling techniques exist of course, such as subsampling, m-out-of-n bootstrap, and jackknifing, and have been studied and much discussed (see [16], [31], [53], [5], and [18] for a practical introduction). An important limitation for the bootstrap is the quality of ˆF. The standard bootstrap estimate of F based on the empirical distribution of the data may be a poor estimate when the data has a nontrivial dependency structure, when the quantity being estimated, such as a quantile, is sensitive to the discreteness of ˆF, or when the functionals of interest are not smooth (see e.g [6] for a classic reference, as well as [3] or [14] in the context of multivariate statistics).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found