blb
Statistical inference in massive datasets by empirical likelihood
Ma, Xuejun, Wang, Shaochen, Zhou, Wang
With the rapid development of science and technologies, massive data can be collected at a large speed, especially in internet and financial fields. It is generally recognized that two major challenges in large-scale learning are estimation and inference due to large amount of computation. For statistical inference on massive data sets, Kleiner et al. (2014) proposed the bag of little bootstrap (BLB) to assess the quality of estimators. However, they used only a small number of random subsets, and partial observations from each subset. This implies less efficiency in application.
The Big Data Bootstrap
Kleiner, Ariel, Talwalkar, Ameet, Sarkar, Purnamrita, Jordan, Michael
The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we present the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality. BLB is well suited to modern parallel and distributed computing architectures and retains the generic applicability, statistical efficiency, and favorable theoretical properties of the bootstrap. We provide the results of an extensive empirical and theoretical investigation of BLB's behavior, including a study of its statistical correctness, its large-scale implementation and performance, selection of hyperparameters, and performance on real data.