Can we trust the bootstrap in high-dimension?

Karoui, Noureddine El, Purdom, Elizabeth

Aug-2-2016–arXiv.org Machine Learning

The bootstrap [15] is a ubiquitous tool in applied statistics, allowing for inference when very little is known about the properties of the data-generating distribution. The bootstrap is a powerful tool in applied settings because it does not make the strong assumptions common to classical statistical theory regarding this data-generating distribution. Instead, the bootstrap resamples the observed data to create an estimate, ˆF, of the unknown data-generating distribution, F. ˆF then forms the basis of further inference. Since its introduction, a large amount of research has explored the theoretical properties of the bootstrap, improvements for estimating F under different scenarios, and how to most effectively estimate different quantities from ˆF (see the pioneering [6] for instance and many many more references in the book-length review of [8], as well as [61] for a short summary of the modern point of view on these questions). Other resampling techniques exist of course, such as subsampling, m-out-of-n bootstrap, and jackknifing, and have been studied and much discussed (see [16], [31], [53], [5], and [18] for a practical introduction). An important limitation for the bootstrap is the quality of ˆF. The standard bootstrap estimate of F based on the empirical distribution of the data may be a poor estimate when the data has a nontrivial dependency structure, when the quantity being estimated, such as a quantile, is sensitive to the discreteness of ˆF, or when the functionals of interest are not smooth (see e.g [6] for a classic reference, as well as [3] or [14] in the context of multivariate statistics).

artificial intelligence, machine learning, variance, (18 more...)

arXiv.org Machine Learning

Aug-2-2016

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.27)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.92)
  - Machine Learning > Statistical Learning
    - Regression (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found