Goto

Collaborating Authors

 mahoney


Optimal Subsampling with Influence Functions

Daniel Ting, Eric Brochu

Neural Information Processing Systems

As the amount of data increases, the question arises as to how best to deal with the large datasets. While computational platforms such as Spark [28] and Ray [23] help process large datasets once a desired model is chosen, simply using smaller data can be a faster solution for exploratory data modeling, rapid prototyping, or other tasks where the accuracy obtainable from the full dataset is notneeded.



aff1621254f7c1be92f64550478c56e6-Paper.pdf

Neural Information Processing Systems

Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature.