conditional random sampling
One sketch for all: Theory and Application of Conditional Random Sampling
Conditional Random Sampling (CRS) was originally proposed for efficiently computing pairwise ( l_2, l_1) distances, in static, large-scale, and sparse data sets such as text and Web data. It was previously presented using a heuristic argument. This study extends CRS to handle dynamic or streaming data, which much better reflect the real-world situation than assuming static data. Compared with other known sketching algorithms for dimension reductions such as stable random projections, CRS exhibits a significant advantage in that it is one-sketch-for-all.'' In particular, we demonstrate that CRS can be applied to efficiently compute the l_p distance and the Hilbertian metrics, both are popular in machine learning.
Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data
We1 develop Conditional Random Sampling (CRS), a technique particularly suit- able for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l2 and l1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections.
One sketch for all: Theory and Application of Conditional Random Sampling
Li, Ping, Church, Kenneth W., Hastie, Trevor J.
Conditional Random Sampling (CRS) was originally proposed for efficiently computing pairwise ($l_2$, $l_1$) distances, in static, large-scale, and sparse data sets such as text and Web data. It was previously presented using a heuristic argument. This study extends CRS to handle dynamic or streaming data, which much better reflect the real-world situation than assuming static data. Compared with other known sketching algorithms for dimension reductions such as stable random projections, CRS exhibits a significant advantage in that it is one-sketch-for-all.'' In particular, we demonstrate that CRS can be applied to efficiently compute the $l_p$ distance and the Hilbertian metrics, both are popular in machine learning.
One sketch for all: Theory and Application of Conditional Random Sampling
Li, Ping, Church, Kenneth W., Hastie, Trevor J.
Conditional Random Sampling (CRS) was originally proposed for efficiently computing pairwise ($l_2$, $l_1$) distances, in static, large-scale, and sparse data sets such as text and Web data. It was previously presented using a heuristic argument. This study extends CRS to handle dynamic or streaming data, which much better reflect the real-world situation than assuming static data. Compared with other known sketching algorithms for dimension reductions such as stable random projections, CRS exhibits a significant advantage in that it is ``one-sketch-for-all.'' In particular, we demonstrate that CRS can be applied to efficiently compute the $l_p$ distance and the Hilbertian metrics, both are popular in machine learning. Although a fully rigorous analysis of CRS is difficult, we prove that, with a simple modification, CRS is rigorous at least for an important application of computing Hamming norms. A generic estimator and an approximate variance formula are provided and tested on various applications, for computing Hamming norms, Hamming distances, and $\chi^2$ distances.